Vinod Kumar MTechnology Evangelist – DB and BIMicrosoftwww.ExtremeExperts.com
Objectives and Takeaways
A high level viewDesign considerations
How to measure performance
Performance implications of architecture
Manageability aspects of SSIS
Deployment tips
Out of scopePrescriptive guidance for specific situations
Agenda
Quick Introduction
Understanding Buffers and Memory
OVAL Concept Detailed
Component Specific Notes
Manageability Features
Deployment Considerations
Introduction
SSIS Life Cycle tools
Design the SSIS PackageBusiness Intelligence Studio (visual Studio)
Migration wizard for pre SQL 2005 packages
Version Control Integration (VSS)
Deployment/ExecutionDeployment Utility to copy packages
Command Line execution (dtexec.exe and dtexecui.exe)
Flexible Configuration Options
SupportabilityRich per package Logging
SQL Management Studio for monitoring running packages and organizing stored packages
Checkpoint - Restartability
Deep dive - Performance
Buffers and Memory
Buffers based on design time metadataThe width of a row determines the size of the buffer
Smaller rows = more rows in memory = greater efficiency
Memory copies are expensive!A buffer might have placeholder columns filled by downstream components
Pointer magic where possible
Component Types
Logically works at a row level
Buffer Reused
Data Convert, Derived Column
Row based(synchronousoutputs)
Partially Blocking(asynchronousoutputs)
Blocking(asynchronousoutputs)
May logically work at a row level
Data copied to new buffers
Merge, Merge Join, Union All
Needs all input buffers before producing any output rows
Data copied to new buffers
Aggregate, Sort
CPU Utilization
Execution TreeStarts from a source or an async output
Ends at a destination or an input that has no sync outputs
Each Execution Tree can get a worker thread
MaxEngineThreads to control parallelism
Performance Strategy
Use OVAL to identify the factors affecting data integration performance…
Operations
Which app is best suited to these operations on this volume of data? For example, use SQL Server or SSIS for sorting data?
Volume
Application
Location
How much data must be processed?
What logic should be applied to the data?
Where should the app run? For example, on a shared server, or on a standalone machine?
An OVAL Example—Loading a Text File
Simple scenario…
Interesting performance considerations!
Text file on Server 1 SQL Server on Server 2
Understand all operations performed
Operations
Beware of hidden operationsData conversion in either step 3 or 4
1. Open a transaction on SQL Server
2. Read data from the text file
3. Load data into the SSIS data flow
4. Load the data into SQL Server
5. Commit the transaction
Volume
Reduce where possible
Don’t push unneeded columns
Conditional split for filtering rows
Do not parse or convert columns unnecessarily
In a fixed-width format you can combine adjacent unneeded columns into one
Leave unneeded columns as strings
Application
Is SSIS right for this?
Overhead of starting up an SSIS package may offset any performance gain over BCP for small data sets.
Is BCP good enough?
Is the greater manageability and control of SSIS needed?
Bulk Import Task vs. Data Flow
Location
Consider the following configuration …
Text file on Server 1 SQL Server on Server 2
Where should SSIS run?
(Licensing issues aside)
Measuring Performance
OVAL does not provide prescriptive guidance
Too many variables
Improve performance by applying OVAL and measuring
SSIS Logging
Performance counters
SQL Server Profiler
For extract queries, lookups and loading
ParallelismFocus on critical path
Utilize available resources
Memory Constrained Reader and CPU Constrained
Let it rip! Optimize the slowest
Moving Ahead
Manageability Features
Logging and Log Providers
Checkpoint Restartability
Precedence Constraints
Configurations
SSIS Service
CheckpointingCheckpoint File Created
Write Checkpoint
Write Checkpoint
Write Checkpoint
Checkpoint File deleted
Package Loads
Package Completes
Data Flow Task
Data Flow Task
Send Mail Task
Configuration Scenario
Dev DB
Multiple Configurations
DevTest Production
Test DBProd DB
Machines where packages are being designed /tested /executed
Configuration updates package on load with DB locations (and mail server, file share locations….)
Package Handoff
Precedence constraints
Directs Flow from object to object…
Basically, ‘when do I move on’
Success, Failure, Completion or one of those plus an expression (condition)
Dataflow Task
SendMail Task
Success
Completion
Failure
Success & expression
Deployment Flow
Tools to organize and ‘copy’ packages and supporting files
•Design Package•Add Configurations•Add Miscellaneous files•Set Project Deployment properties•Build
•Choose Destination (SQL File System) •Modify protection level•Choose location of supporting files•Change configurations•Execute Installation Wizard
Bi Studio
•Copy/Move Deployment folder\files User
•Create desired agent jobs SQL Agent
•Copy/Move Deployment folder\files User
SQL Management Studio
Utilizes the SSIS service
Allows Monitoring of currently Executing packages
Maintain stored package structure
Ad hoc Package execution
SSIS: SummaryFast !
Data flows process large volumes of data efficiently - even through complex operationsExceptional price / performance on multi-core
Feature RichMany pre-built adapters and transformations reduce hand coding
Extensible object model enables specialized custom or scripted components
Highly productive visual environment speeds development and debugging
Integral part of a complete BI stack (IS-AS-RS)
Beyond ETLEnables integration of XML, RSS and Web Services data
Data cleansing features enable “difficult” data to be handled during loading
Data and Text mining allow “smart” handling of data for imputation of incomplete data, conditional processing of potential problems, or smart escalation of issues such as fraud detection
Your Feedbackis Important!
Please Fill Out the feedback form
Questions !!!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.