+ All Categories
Home > Documents > Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft...

Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft...

Date post: 04-Jan-2016
Category:
Upload: peter-oliver
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Advanced ETL: Embedding Advanced ETL: Embedding Integration Services Integration Services Ashvini Sharma Ashvini Sharma Development Lead Development Lead DAT411 DAT411 Microsoft Microsoft Corporation Corporation Sergei Ivanov Sergei Ivanov Technical Lead Technical Lead DAT411 DAT411 Microsoft Microsoft Corporation Corporation
Transcript
Page 1: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

Advanced ETL: Embedding Advanced ETL: Embedding Integration ServicesIntegration Services

Ashvini SharmaAshvini SharmaDevelopment LeadDevelopment LeadDAT411 DAT411 Microsoft Microsoft CorporationCorporation

Sergei IvanovSergei IvanovTechnical LeadTechnical LeadDAT411DAT411Microsoft Microsoft CorporationCorporation

Page 2: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

2

PrerequisitesPrerequisites

Knowledge of Integration ServicesKnowledge of Integration Services

Knowledge of Data Flow FunctionalityKnowledge of Data Flow Functionality

Level 400. Really.Level 400. Really.

Page 3: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

3

ObjectivesObjectives

Introduction to SSIS programming Introduction to SSIS programming modelmodel

Learn how to integrate with dynamic Learn how to integrate with dynamic metadatametadata

Learn how to utilize data cleansing Learn how to utilize data cleansing functionality in your appsfunctionality in your apps

Page 4: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

4

Integration ServicesIntegration Services

Page 5: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

5

SSIS TerminologySSIS TerminologyPackagePackage

TasksTasks

Precedence ConstraintsPrecedence Constraints

Connection ManagersConnection Managers

ContainersContainers

Data Flow TaskData Flow TaskComponents – Source Components – Source Adapters, Transformations, Adapters, Transformations, Destination AdaptersDestination Adapters

PathsPaths

Page 6: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

6

Application OverviewApplication Overview

Get data from an Excel fileGet data from an Excel file

Provide fuzzy cleansing for certain Provide fuzzy cleansing for certain text fieldstext fields

FirstName, LastName FirstName, LastName

Save cleaned data in another Excel Save cleaned data in another Excel filefile

Look at finished application first, then Look at finished application first, then go through several iterations to build go through several iterations to build itit

Page 7: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

7

Application Application

Page 8: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

8

SSIS is embeddableSSIS is embeddableSQL Server uses SSISSQL Server uses SSIS

SMOSMO

Maintenance PlansMaintenance Plans

Other (non SQL) products in development are using Other (non SQL) products in development are using SSISSSIS

Writing your own UI is possibleWriting your own UI is possibleSSIS designer, Management Studio, Import/Export Wizard, SSIS designer, Management Studio, Import/Export Wizard, Migration WizardMigration Wizard

Uses Uses nono secret APIssecret APIsEnumerating/adding/removing/changing/listening/Enumerating/adding/removing/changing/listening/scheduling/…scheduling/…

Considering releasing Migration Wizard in Shared SourceConsidering releasing Migration Wizard in Shared Source

Digital signing enables tamper resistanceDigital signing enables tamper resistance

Several customers doing metadata driven package Several customers doing metadata driven package developmentdevelopment

Page 9: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

9

Pipeline MetadataPipeline Metadata

Pipeline engine requires static Pipeline engine requires static metadatametadata

Early design decisionEarly design decision

Buffers laid out during pre executeBuffers laid out during pre executeStrict data typesStrict data types

Cannot map columns during executionCannot map columns during execution

Designer debugging expects design time Designer debugging expects design time metadata at execution timemetadata at execution time

Configured (dynamic) queries must Configured (dynamic) queries must resolve to design time metadata at resolve to design time metadata at runtimeruntime

Page 10: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

10

Dynamic MetadataDynamic Metadata

ScenariosScenariosSource schema changes/not known until Source schema changes/not known until executionexecution

Metadata driven ETL processesMetadata driven ETL processes

Handling dynamic metadataHandling dynamic metadataGenerate data flows dynamicallyGenerate data flows dynamically

Page 11: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

11

Creating PackagesCreating Packages

Page 12: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

12

Creating PackagesCreating Packages

From scratch through object modelFrom scratch through object modelCreate all package elements from Create all package elements from scratchscratch

Fast, small, efficientFast, small, efficient

Harder to evolve the applicationHarder to evolve the application

From template packageFrom template packageAdjust only what needs adjusting after Adjust only what needs adjusting after loading the template packageloading the template package

Need to embed potentially large Need to embed potentially large template filetemplate file

Easier to evolve the applicationEasier to evolve the application

Digital signing detects user changesDigital signing detects user changes

Page 13: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

13

Components TerminologyComponents TerminologyComponentComponent

InputInputInput Columns (Only data referenced by component)Input Columns (Only data referenced by component)

Virtual Input Columns (All available data produced by Virtual Input Columns (All available data produced by upstream components – used at design time for upstream components – used at design time for selecting input columns)selecting input columns)

External Metadata Columns (Schema snapshot)External Metadata Columns (Schema snapshot)

OutputOutputOutput Columns (Produced data)Output Columns (Produced data)

External Metadata Columns (Schema snapshot)External Metadata Columns (Schema snapshot)

LineageID uniquely identifies a column LineageID uniquely identifies a column Every output column gets a new Lineage IDEvery output column gets a new Lineage ID

Column MappingColumn MappingSources: ExternalColumn<->OutputColumnSources: ExternalColumn<->OutputColumn

Transforms: InputColumn<->OutputColumnTransforms: InputColumn<->OutputColumn

Destinations: InputColumn<->ExternalColumnDestinations: InputColumn<->ExternalColumn

Page 14: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

14

Pipeline Programming Pipeline Programming ModelModel ComponentMetadataComponentMetadata

Provided for all Provided for all components by the components by the engine automaticallyengine automatically

Manages metadata and Manages metadata and persistence for the persistence for the componentcomponent

Contact information for Contact information for unregistered unregistered componentscomponents

Helps delay creation of Helps delay creation of components until components until necessarynecessary

Runtime Connection Runtime Connection CollectionCollection

Connection managers Connection managers used by the componentused by the component

ComponentMetaDataComponentMetaData

InputsInputs

OutputOutputss

ComponentComponent

RCCRCC

Page 15: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

15

Configuring Data FlowsConfiguring Data Flows

Page 16: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

16

Using Fuzzy transformsUsing Fuzzy transforms

Page 17: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

17

SSIS As A SourceSSIS As A Source

ETL processes ETL processes typically encode typically encode complex business complex business rulesrules

Reuse is importantReuse is importantOne version of the One version of the truthtruth

Updates in one placeUpdates in one place

Leverage advantages Leverage advantages of SSIS: scalability, of SSIS: scalability, manageability, visual manageability, visual building of complex building of complex processes, etc.processes, etc.

Page 18: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

18

SSIS Source SSIS Source ImplementationImplementation

Implements Implements IDbConnectionIDbConnection

ConnectionString is the ConnectionString is the command line args to command line args to dtexec.exedtexec.exe

CommandCommandCommandText is the CommandText is the name of the name of the DataReaderDest DataReaderDest component in packagecomponent in packageExecuteReader runs the ExecuteReader runs the package when asked for package when asked for data, returns IDataReaderdata, returns IDataReader

Supports SchemaOnly Supports SchemaOnly alsoalso

DataReaderDest DataReaderDest implements IDataReaderimplements IDataReaderGets the first buffer and Gets the first buffer and waits for data requestwaits for data request

Microsoft.SqlServer.Dts.DtsClientMicrosoft.SqlServer.Dts.DtsClient Data Reader Destination ComponentData Reader Destination Component

Page 19: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

19

Putting it togetherPutting it together

Page 20: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

20

SummarySummary

Programming SSIS is straightforward Programming SSIS is straightforward

Several embedding options existSeveral embedding options exist

SSIS can handle flexible metadataSSIS can handle flexible metadata

SSIS provides rich functionality and SSIS provides rich functionality and high performance high performance

Page 21: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

21

ResourcesResources

Embedding Reporting and Analysis in your Embedding Reporting and Analysis in your Smart Client AppSmart Client App DAT313 – 502AB 5:00PM DAT313 – 502AB 5:00PM

Samples installed by setupSamples installed by setup

Community site, run by MVPsCommunity site, run by MVPshttp://www.sqlis.comhttp://www.sqlis.com

Interact with product team on MSDN Interact with product team on MSDN ForumsForums

http://forums.microsoft.com/msdn/http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=80 ShowForum.aspx?ForumID=80

Webcasts, training, blog links, books, …Webcasts, training, blog links, books, …http://msdn.microsoft.com/SQL/sqlwarehouse/http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx SSIS/default.aspx

Page 22: Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


Recommended