SSISSSISExploring Scalability, Exploring Scalability,
Performance and Performance and Deployment Deployment
Vinod Kumar & Srinivas SampathVinod Kumar & Srinivas SampathMVP – SQL ServerMVP – SQL Server
Presentation ScopePresentation Scope
A high level viewA high level viewDesign considerationsDesign considerations
How to measure performanceHow to measure performance
Performance implications of architecturePerformance implications of architecture
Manageability aspects of SSISManageability aspects of SSIS
Deployment tipsDeployment tips
Out of scopeOut of scopePrescriptive guidance for specific Prescriptive guidance for specific situationssituations
AgendaAgenda
Buffers and MemoryBuffers and Memory
OVAL Concept DetailedOVAL Concept Detailed
Component Specific NotesComponent Specific Notes
Manageability FeaturesManageability Features
Deployment ConsiderationsDeployment Considerations
SSIS Life Cycle toolsSSIS Life Cycle toolsDesign the SSIS PackageDesign the SSIS Package
Business Intelligence Studio (visual Studio)Business Intelligence Studio (visual Studio)
Migration wizard for pre SQL 2005 packagesMigration wizard for pre SQL 2005 packages
Version Control Integration (VSS)Version Control Integration (VSS)
Deployment/ExecutionDeployment/ExecutionDeployment Utility to copy packagesDeployment Utility to copy packages
Command Line execution (dtexec.exe and dtexecui.exe)Command Line execution (dtexec.exe and dtexecui.exe)
Flexible Configuration OptionsFlexible Configuration Options
SupportabilitySupportabilityRich per package Logging Rich per package Logging
SQL Management Studio for monitoring running packages SQL Management Studio for monitoring running packages and organizing stored packages and organizing stored packages
Checkpoint - RestartabilityCheckpoint - Restartability
SSIS ToolsSSIS Tools
SSIS packagespackages
BI Studio
SSIS Service
Mgt Studio
Import Export Wizard
Deployment
Installer File set
Dtexec.exe
Dtexecui.exe
Dtutil.exe
executionView running and import\export
deploy
Buffers and MemoryBuffers and MemoryBuffers based on design time metadataBuffers based on design time metadata
The width of a row determines the size of the The width of a row determines the size of the bufferbuffer
Smaller rows = more rows in memory = greater Smaller rows = more rows in memory = greater efficiencyefficiency
Memory copies are expensive!Memory copies are expensive!A buffer might have placeholder columns filled by A buffer might have placeholder columns filled by downstream componentsdownstream components
Pointer magic where possiblePointer magic where possible
Component TypesComponent Types
Logically works at a row levelLogically works at a row levelBuffer ReusedBuffer ReusedData Convert, Derived ColumnData Convert, Derived Column
Row basedRow based(synchronous(synchronous
outputs)outputs)
Partially Partially BlockingBlocking
(asynchronous(asynchronousoutputs)outputs)
BlockingBlocking(asynchronous(asynchronous
outputs)outputs)
May logically work at a row levelMay logically work at a row levelData copied to new buffersData copied to new buffersMerge, Merge Join, Union AllMerge, Merge Join, Union All
Needs all input buffers before Needs all input buffers before producing any output rowsproducing any output rowsData copied to new buffersData copied to new buffersAggregate, SortAggregate, Sort
CPU UtilizationCPU Utilization
Execution TreeExecution TreeStarts from a source Starts from a source or an async outputor an async output
Ends at a destination Ends at a destination or an input that has or an input that has no sync outputsno sync outputs
Each Execution Tree Each Execution Tree can get a worker can get a worker threadthread
MaxEngineThreads to MaxEngineThreads to control parallelismcontrol parallelism
Performance StrategyPerformance Strategy
Use OVAL to identify the factors affecting data Use OVAL to identify the factors affecting data integration performance…integration performance…
Operations
Which app is best suited to these operations on this volume of data? For example, use SQL Server or SSIS for sorting data?
Volume
Application
Location
How much data must be processed?
What logic should be applied to the data?
Where should the app run? For example, on a shared server, or on a standalone machine?
An OVAL Example—An OVAL Example—Loading a Text FileLoading a Text File
Simple scenario…Simple scenario…
Interesting performance considerations!Interesting performance considerations!
Text file on Server 1 SQL Server on Server 2
Understand Understand allall operations performed operations performed
OperationsOperations
Beware of Beware of hiddenhidden operations operationsData conversion in either step 3 or 4Data conversion in either step 3 or 4
1.1. Open a transaction on SQL ServerOpen a transaction on SQL Server
2.2. Read data from the text fileRead data from the text file
3.3. Load data into the SSIS data flowLoad data into the SSIS data flow
4.4. Load the data into SQL ServerLoad the data into SQL Server
5.5. Commit the transactionCommit the transaction
File SourceFile SourceUnnecessary data type conversionsUnnecessary data type conversions
‘‘FastParse’ in Flat File SourceFastParse’ in Flat File Source
Unnecessary operations: E.g., Unnecessary operations: E.g., converting from text to datetime, then converting from text to datetime, then from datetime to datefrom datetime to date
Reduce database operationsReduce database operationsDatabase loggingDatabase logging
Commit sizeCommit size
Fast LoadFast Load
Table lockTable lock
Operations - SharpenOperations - Sharpen
VolumeVolume
Reduce where possibleReduce where possibleDon’t push unneeded columnsDon’t push unneeded columns
Conditional split for filtering rowsConditional split for filtering rows
Do not parse or convert columns Do not parse or convert columns unnecessarilyunnecessarily
In a fixed-width format you can combine In a fixed-width format you can combine adjacent unneeded columns into oneadjacent unneeded columns into one
Leave unneeded columns as stringsLeave unneeded columns as strings
Volume - SharpenVolume - Sharpen
Use appropriate data types Use appropriate data types An integer in the range 1-999 takes 2 bytes An integer in the range 1-999 takes 2 bytes as an integer, 3 bytes as a string, but 4 as an integer, 3 bytes as a string, but 4 bytes as a realbytes as a real
Suggest TypesSuggest Types in the flat file connection in the flat file connection manager UImanager UI
Use parallelism Use parallelism If loading multiple files, can they be If loading multiple files, can they be loaded in parallel?loaded in parallel?
Application Application
Is SSIS right for this?Is SSIS right for this?Overhead of starting up an SSIS package Overhead of starting up an SSIS package may offset any performance gain over may offset any performance gain over BCP for small data sets.BCP for small data sets.
Is BCP good enough?Is BCP good enough?Is the greater manageability and control Is the greater manageability and control of SSIS needed?of SSIS needed?
Bulk Import Task vs. Data FlowBulk Import Task vs. Data Flow
LocationLocation
Consider the following configuration Consider the following configuration ……
Text file on Server 1 SQL Server on Server 2
Where should SSIS run? Where should SSIS run? (Licensing issues aside)(Licensing issues aside)
Location ConsiderationsLocation ConsiderationsSSIS on Server 1SSIS on Server 1
Competes with apps for resourcesCompetes with apps for resources
Will data conversion on Server 1 reduce or Will data conversion on Server 1 reduce or increase the volume of data transferred across increase the volume of data transferred across the network?the network?
Can not use the fast SSIS SQL Server DestinationCan not use the fast SSIS SQL Server Destination
SSIS on Server 2 SSIS on Server 2 Competes with SQL Server for resourcesCompetes with SQL Server for resources
Will pulling text over conversion be expensive?Will pulling text over conversion be expensive?Also consider transferring the file unparsed to Server 2 Also consider transferring the file unparsed to Server 2 and read it locally from thereand read it locally from there
Can use the fast SSIS SQL Server DestinationCan use the fast SSIS SQL Server Destination
Measuring PerformanceMeasuring Performance
OVAL does not provide prescriptive OVAL does not provide prescriptive guidanceguidance
Too many variables Too many variables
Improve performance by applying Improve performance by applying OVAL and measuringOVAL and measuring
SSIS LoggingSSIS Logging
Performance countersPerformance counters
SQL Server ProfilerSQL Server ProfilerFor extract queries, lookups and loadingFor extract queries, lookups and loading
ParallelismParallelismFocus on critical pathFocus on critical path
Utilize available resourcesUtilize available resourcesMemory ConstrainedMemory Constrained Reader and CPU ConstrainedReader and CPU Constrained
Let it rip!Let it rip! Optimize the slowestOptimize the slowest
Manageability FeaturesManageability Features
Logging and Log ProvidersLogging and Log Providers
Checkpoint RestartabilityCheckpoint Restartability
Precedence ConstraintsPrecedence Constraints
ConfigurationsConfigurations
SSIS ServiceSSIS Service
Logging and Log ProvidersLogging and Log ProvidersLog entries are a blend of status and Log entries are a blend of status and result messagesresult messages
Can select what ‘details’ per control flow Can select what ‘details’ per control flow object within each package (e.g. OnError, object within each package (e.g. OnError, OnWarning, OnPreExecute)OnWarning, OnPreExecute)
Can select what fields (e.g.computer, Can select what fields (e.g.computer, operator, ExecutionID…)operator, ExecutionID…)
Can define multiple log providers (SQL, Can define multiple log providers (SQL, text file, Windows Event..) per packagetext file, Windows Event..) per package
CheckpointingCheckpointingCheckpoint File Created
Write Checkpoint
Write Checkpoint
Write Checkpoint
Checkpoint File deleted
Package Loads
Package Completes
Data Flow Task
Data Flow Task
Send Mail Task
ConfigurationsConfigurations
‘‘Feed’ changes into a package and alter Feed’ changes into a package and alter execution without editing the package execution without editing the package directly (e.g. file name to load)directly (e.g. file name to load)
The ‘feed’ can be sourced from a SQL The ‘feed’ can be sourced from a SQL table, XML file, Registry key, OS table, XML file, Registry key, OS environment var, a Parent package.environment var, a Parent package.
You can apply 1-many configuration You can apply 1-many configuration sets per package and from a mix of sets per package and from a mix of sourcessources
Configuration ScenarioConfiguration Scenario
Dev DB
Multiple Configurations
DevTest Production
Test DB Prod DB
Machines where packages are being designed /tested /executed
Configuration updates package on load with DB locations (and mail server, file share locations….)
Package Handoff
Precedence constraintsPrecedence constraints
Directs Flow from object to object…Directs Flow from object to object…
Basically, ‘when do I move on’Basically, ‘when do I move on’
Success, Failure, Completion or one of Success, Failure, Completion or one of those plus an expression (condition)those plus an expression (condition)
Dataflow Task
SendMail Task
Success
Completion
Failure
Success & expression
Deployment Deployment FlowFlow
Tools to Tools to organize and organize and ‘copy’ ‘copy’ packages and packages and supporting supporting filesfiles
•Design Package•Add Configurations•Add Miscellaneous files•Set Project Deployment properties•Build
•Choose Destination (SQL File System) •Modify protection level•Choose location of supporting files•Change configurations•Execute Installation WizardInstallation Wizard
Bi StudioBi Studio
•Copy/Move Deployment folder\files UserUser
•Create desired agent jobs SQL AgentSQL Agent
•Copy/Move Deployment folder\files UserUser
SQL Management StudioSQL Management Studio
Utilizes the SSIS serviceUtilizes the SSIS service
Allows Monitoring of currently Executing Allows Monitoring of currently Executing packagespackages
Maintain stored package structureMaintain stored package structure
Ad hoc Package executionAd hoc Package execution
Performance of LookupsPerformance of Lookups
The reference setThe reference setRestrict to only those columns you actually useRestrict to only those columns you actually useRestrict rows with WHERE if possibleRestrict rows with WHERE if possible
The lookup cacheThe lookup cacheCaching can improve performance Caching can improve performance Full cacheFull cache
When the reference set will fit comfortably in memory When the reference set will fit comfortably in memory
PartialPartialBuild a cache as the input records are matchedBuild a cache as the input records are matchedUseful for duplicate keys in the input, such as SKUsUseful for duplicate keys in the input, such as SKUs
NoneNoneReference set doesn’t fit in memory and partial cache Reference set doesn’t fit in memory and partial cache has no advantagehas no advantage
Performance of AggregatePerformance of Aggregate
Majority of work happens in Majority of work happens in ProcessInput call. ProcessInput call.
This is on the thread in the previous This is on the thread in the previous execution tree!execution tree!
Memory requirements depend on how Memory requirements depend on how ‘deep’ the aggregations are ‘deep’ the aggregations are
Can reuse buckets if one agg can be Can reuse buckets if one agg can be derived from anotherderived from another
Use when memory is limited, single Use when memory is limited, single threaded operationthreaded operation
Performance of SortPerformance of Sort
ProcessInput hangs on to the incoming dataProcessInput hangs on to the incoming data
PrimeOutput does the sort and is the PrimeOutput does the sort and is the expensive partexpensive part
Sort needs all data to be in memorySort needs all data to be in memory
Sort can have unpredictable CPU Sort can have unpredictable CPU requirements requirements
Merging is single threadedMerging is single threaded
Stock Sort component will be good enough Stock Sort component will be good enough for most usersfor most users
Third party (“fastest sort in the world”) Third party (“fastest sort in the world”) available if you really need itavailable if you really need it
Swapping buffersSwapping buffers
When physical memory is not availableWhen physical memory is not available
Each buffer gets written out to one fileEach buffer gets written out to one file
Multiple paths can be specified for Multiple paths can be specified for swapping buffersswapping buffers
BufferTempStoragePath property on the BufferTempStoragePath property on the PipelinePipeline
Do everything in your power to avoid Do everything in your power to avoid swappingswapping
Else, performance is really unpredictableElse, performance is really unpredictable
Options: 64 bits, out of process execution, Options: 64 bits, out of process execution, serializing operationsserializing operations
SSIS: SummarySSIS: SummaryFast !Fast !
Data flows process large volumes of data efficiently - even Data flows process large volumes of data efficiently - even through complex operationsthrough complex operationsExceptional price / performance on multi-coreExceptional price / performance on multi-core
Feature RichFeature RichMany pre-built adapters and transformations reduce hand codingMany pre-built adapters and transformations reduce hand coding
Extensible object model enables specialized custom or scripted Extensible object model enables specialized custom or scripted componentscomponents
Highly productive visual environment speeds development and Highly productive visual environment speeds development and debuggingdebugging
Integral part of a complete BI stack (IS-AS-RS)Integral part of a complete BI stack (IS-AS-RS)
Beyond ETLBeyond ETLEnables integration of XML, RSS and Web Services dataEnables integration of XML, RSS and Web Services data
Data cleansing features enable “difficult” data to be handled Data cleansing features enable “difficult” data to be handled during loadingduring loading
Data and Text mining allow “smart” handling of data for Data and Text mining allow “smart” handling of data for imputation of incomplete data, conditional processing of potential imputation of incomplete data, conditional processing of potential problems, or smart escalation of issues such as fraud detectionproblems, or smart escalation of issues such as fraud detection
Your FeedbackYour Feedbackis Important!is Important!
Please Fill Out the Please Fill Out the feedback formfeedback form
Links & ResourcesLinks & Resources
Vinod Kumar, MVP-SQL Server,www.ExtremeExperts.comIntel Technology India Pvt. [email protected]
SQL Server Integration SQL Server Integration Services public siteServices public site
http://msdn.microsoft.com/SQL/sqlwarhttp://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspxehouse/SSIS/default.aspx
SQL Server Business SQL Server Business Intelligence public site Intelligence public site
http://www.microsoft.com/sql/evaluatihttp://www.microsoft.com/sql/evaluation/bi/default.aspon/bi/default.asp
SSIS MVPs community siteSSIS MVPs community site http://http://www.sqlis.comwww.sqlis.com
NewsgroupsNewsgroups microsoft.private.sqlserver2005.dtsmicrosoft.private.sqlserver2005.dts
Srinivas Sampath, MVP-SQL Server www32.brinkster.com/srisampSCT Software Solutionssrisamp@