Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | cleopatra-rose |
View: | 221 times |
Download: | 3 times |
SSIS:From the
Asynchronousto the Synchronous
John Mathews
Why?CapEx + OpEX
OpEx
How?
Pipelines
Sources
SQL Server
Transformations
Lookups Full Blockers
Destinations
Partitioned Tables
Pipelines
Asynchronous Synchronous
Blocking
Buff
er
s
Moving to AzureHardware
Downsizing
ProfessorJohn Nash(1928 - 2015)
I've always believed in numbers and the equations and logics that lead to
reason.
But after a lifetime of such pursuits, I ask,
“What truly is logic?”
“Who decides reason?”
My quest has taken me through the physical, the metaphysical, the
delusional -- and back.
LookupsLate arriving dimensions
Denormalised data
Alternatives
Slowly changing dimensions
One to many lookups
LookupsTarget
structures
OwnerFirstNameMiddleNamesSurname
ItemName
CollectionNameValidFromValidTo
LookupsTarget
structures
LookupsSource
4,432,277 rows
Lookups2 pass
solution
01:24.672
Lookups2 pass
solution
31.547 sec
Lookups2 pass
solution
52.141 sec
LookupsLate
Dimensions
Foreach Row
Send Row on
Dimensionpartial-cache
lookup successful?
Dimensionfull-cache lookup
successful?
Set resolutionfrom full-cache
Set resolution with value in dimension
Add value to dimension
Set resolutionfrom partial-
cache
BS6224:1987
LookupsLate
Dimension
Solution
> 4 MINUTES!!!!!
4,432,277 4,432,27
7
0
1,401,849
1,401,8491,401,84
9
3,030,428
But there are only 100 Owners!
LookupsLate
Dimension
Solution
Full cache lookup Partial cache lookup
OLEDB To Write Record
No match
No match
MatchMatch
LookupsLate
Dimension
Solution
Full cache lookup Partial cache lookup
OLEDB To Write Record
No match
No match
MatchMatch
LookupsLate
Dimension
Solution
Full cache lookup Partial cache lookup
OLEDB To Write Record
No match
No match
MatchMatch
LookupsLate
Dimension
Solution
Full cache lookup Partial cache lookup
OLEDB To Write Record
No match
No match
MatchMatch
LookupsActive -
Theory
Cache lookup not successful?
Update dimension & cache
Preload cache
Foreach Row
Send Row on
Update Row from cache
Start
End
BS6224:1987
LookupsActive -
Implementati
on
LookupsActive -
Implementati
on
Dictionary<Key, Value>
Immutable Object
Configuration
LookupsActive -
Implementati
on
Inputs
LookupsActive -
Implementati
on
Outputs
LookupsActive -
Implementati
on
“Clean Code” Robert C. Martin (Uncle
Bob)
LookupsActive -
Implementati
on
Connection Managers
LookupsActive -
Implementati
on
Pre Executeprivate readonly Dictionary<Owner, int> cache =
new Dictionary<Owner, int>();
Key’s properties set through constructor
Key’s properties are READ ONLY {get; private set;}
Need to override GetHashCode() & Equals() of Key
LookupsActive -
Implementati
on
LookupsActive -
Implementati
on
LookupsActive -
Implementati
on
LookupsActive -
Implementati
on
LookupsActive -
Implementati
on
LookupsActive -
Implementati
on
Pre Execute
LookupsActive -
Performance
24.547 sec
22%
LookupsOne to Many
Pre Executeprivate readonly Dictionary<Key, IEnumerable<Value>> cache =
new Dictionary<Key, IEnumerable<Value>>();
Output is Asynchronous
Output row count per input row =Number of values retrieved from dictionary
Input columns copied to output
AlternativeLookups
Lookup 1
Lookup 2
Union All
Default value
fx
Matched
Not matched
Matched
Not matched
Lookup 1
Lookup 2
Default value
fx
Not matched
Not matched
Matched
Matched
Merge
Merge
AlternativeLookups
Lookup 3
Each lookup adds its own value
Conditionally apply slow lookups
Select most appropriate result
Coalesce
Lookup 2
Lookup 1
Fully Blocking Transformations
Distinct
Sort
AggregateSx
LookupsDistinct
SSIS’
Algorithm
Cache lookup not
successful?
Add to cache
Write cache out
Start
End
Create empty cache
for rows
Foreach row
Create copy of row
BS6224:1987
LookupsDistinct
SSIS’
Algorithm
Cache lookup not
successful?
Add to cache
Write cache out
Start
End
Create empty cache
for rows
Foreach row
Create copy of row
BS6224:1987
28%
Sorting
Is all of the data required?
Partitioning is your friend!
4,432,277 rows
44,323 rows
x100
Sorting
OwnerId
CollectionId
ItemId
ValidFrom
ValidTo
Partition Key
Sort Key Sort Data
SortKey implements IComparable & IComparable<Key>
Dictionary points to IEnumerable<SortData>
1 Many Many
SortingImplementati
on
BS6224:1987
Start
End
Foreach row
Create Partition Key
for row
Create Sort Key for row
Create Sort Data for row
Is partition key of row different to current key Write out
then clear sorter
Set current key to
partition key of row
Add sort key & data to
sorter
Write out then clear
sorter
SortingImplementati
on
Pre Execute
public override Process_Input0(Input0Buffer buffer){ while (buffer.NextRow()) { ProcessRow(buffer); } if (buffer.EndOfRowSet()) { WriteoutSorter(); }}
private readonly SortedDictionary<SortKey, IEnumerable<SortData>> sorter = new SortedDictionary< SortKey,IEnumerable<SortData>>();
private PartitionKey currentPartitionKey = null;
Aggregation
Pre Execute
Structure identical to sort except
One to one map between key and dataSx
Data is now MUTABLE - Updated as aggregation progresses
Putting it all together
Putting it all together
38.890 sec
(From 01:24.672)
54%
Helping out SQL Server
Helping out SQL Server
Helping out SQL Server
25 HOURS!(250 million rows)
Helping out SQL Server
Read Keys
Helping out SQL Server
Read Keys
Partitioned Left Hash Lookup
Helping out SQL Server
Read Keys
Partitioned Left Hash Lookup
Partitioned Right Hash
Lookup
Helping out SQL Server
Read Keys
Partitioned Left Hash Lookup
Partitioned Right Hash
Lookup
Partitioned Sort(to cluster key for
data)
Read Data(via cluster key)
Merge Join
90 minutes!(250 million rows)
Bulk Loading Partitioned Tables
P
Partitioned Table
Data Source Transform
Bulk Loading Partitioned Tables
Data Source
……………………….
Switch-in Tables
Physical partitions(sort order)
Time based partitioning
Bulk Loading Partitioned Tables
Bulk Loading Partitioned Tables
Destination
Bulk Loading Partitioned Tables
Task
Parallel
Library
.NET4.5
System.Threading.Tasks.
Dataflow
Data Source
Batch Manager
ActionBlock(multi threaded wrapper around a table writer utilising
SqlBulkCopy)
BatchBlocks
Configure:• Size of batches• Number of writer threads
TPL component
Custom code
Bulk Loading Partitioned Tables
SetupAction Block
Bulk Loading Partitioned Tables
Batch ManagerData processor
Bulk Loading Partitioned Tables
Close down
Summary
KISS
eepthortimple
New tools available but one size does not fit all
Utilise partitioning
You have the power
Summary
Developer’s
Guide
ITDEPEND
S !
Thank you
So, it is possible to move from theAsynchronous
to(wards) theSynchronous
@jm99a