+ All Categories
Home > Documents > CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations...

CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations...

Date post: 19-Jan-2016
Category:
Upload: cleopatra-rose
View: 221 times
Download: 3 times
Share this document with a friend
Popular Tags:
65
SSIS: From the Asynchronous to the Synchronous John Mathews
Transcript
Page 1: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

SSIS:From the

Asynchronousto the Synchronous

John Mathews

Page 2: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Why?CapEx + OpEX

OpEx

Page 3: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

How?

Pipelines

Sources

SQL Server

Transformations

Lookups Full Blockers

Destinations

Partitioned Tables

Page 4: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Pipelines

Asynchronous Synchronous

Blocking

Buff

er

s

Page 5: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Moving to AzureHardware

Downsizing

Page 6: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

ProfessorJohn Nash(1928 - 2015)

I've always believed in numbers and the equations and logics that lead to

reason.

But after a lifetime of such pursuits, I ask,

“What truly is logic?”

“Who decides reason?”

My quest has taken me through the physical, the metaphysical, the

delusional -- and back.

Page 7: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsLate arriving dimensions

Denormalised data

Alternatives

Slowly changing dimensions

One to many lookups

Page 8: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsTarget

structures

OwnerFirstNameMiddleNamesSurname

ItemName

CollectionNameValidFromValidTo

Page 9: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsTarget

structures

Page 10: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsSource

4,432,277 rows

Page 11: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Lookups2 pass

solution

01:24.672

Page 12: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Lookups2 pass

solution

31.547 sec

Page 13: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Lookups2 pass

solution

52.141 sec

Page 14: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsLate

Dimensions

Foreach Row

Send Row on

Dimensionpartial-cache

lookup successful?

Dimensionfull-cache lookup

successful?

Set resolutionfrom full-cache

Set resolution with value in dimension

Add value to dimension

Set resolutionfrom partial-

cache

BS6224:1987

Page 15: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsLate

Dimension

Solution

> 4 MINUTES!!!!!

4,432,277 4,432,27

7

0

1,401,849

1,401,8491,401,84

9

3,030,428

But there are only 100 Owners!

Page 16: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsLate

Dimension

Solution

Full cache lookup Partial cache lookup

OLEDB To Write Record

No match

No match

MatchMatch

Page 17: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsLate

Dimension

Solution

Full cache lookup Partial cache lookup

OLEDB To Write Record

No match

No match

MatchMatch

Page 18: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsLate

Dimension

Solution

Full cache lookup Partial cache lookup

OLEDB To Write Record

No match

No match

MatchMatch

Page 19: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsLate

Dimension

Solution

Full cache lookup Partial cache lookup

OLEDB To Write Record

No match

No match

MatchMatch

Page 20: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Theory

Cache lookup not successful?

Update dimension & cache

Preload cache

Foreach Row

Send Row on

Update Row from cache

Start

End

BS6224:1987

Page 21: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Page 22: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Dictionary<Key, Value>

Immutable Object

Configuration

Page 23: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Inputs

Page 24: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Outputs

Page 25: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

“Clean Code” Robert C. Martin (Uncle

Bob)

Page 26: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Connection Managers

Page 27: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Pre Executeprivate readonly Dictionary<Owner, int> cache =

new Dictionary<Owner, int>();

Key’s properties set through constructor

Key’s properties are READ ONLY {get; private set;}

Need to override GetHashCode() & Equals() of Key

Page 28: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Page 29: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Page 30: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Page 31: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Page 32: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Page 33: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Implementati

on

Pre Execute

Page 34: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsActive -

Performance

24.547 sec

22%

Page 35: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsOne to Many

Pre Executeprivate readonly Dictionary<Key, IEnumerable<Value>> cache =

new Dictionary<Key, IEnumerable<Value>>();

Output is Asynchronous

Output row count per input row =Number of values retrieved from dictionary

Input columns copied to output

Page 36: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

AlternativeLookups

Lookup 1

Lookup 2

Union All

Default value

fx

Matched

Not matched

Matched

Not matched

Lookup 1

Lookup 2

Default value

fx

Not matched

Not matched

Matched

Matched

Merge

Merge

Page 37: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

AlternativeLookups

Lookup 3

Each lookup adds its own value

Conditionally apply slow lookups

Select most appropriate result

Coalesce

Lookup 2

Lookup 1

Page 38: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Fully Blocking Transformations

Distinct

Sort

AggregateSx

Page 39: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsDistinct

SSIS’

Algorithm

Cache lookup not

successful?

Add to cache

Write cache out

Start

End

Create empty cache

for rows

Foreach row

Create copy of row

BS6224:1987

Page 40: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

LookupsDistinct

SSIS’

Algorithm

Cache lookup not

successful?

Add to cache

Write cache out

Start

End

Create empty cache

for rows

Foreach row

Create copy of row

BS6224:1987

28%

Page 41: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Sorting

Is all of the data required?

Partitioning is your friend!

4,432,277 rows

44,323 rows

x100

Page 42: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Sorting

OwnerId

CollectionId

ItemId

ValidFrom

ValidTo

Partition Key

Sort Key Sort Data

SortKey implements IComparable & IComparable<Key>

Dictionary points to IEnumerable<SortData>

1 Many Many

Page 43: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

SortingImplementati

on

BS6224:1987

Start

End

Foreach row

Create Partition Key

for row

Create Sort Key for row

Create Sort Data for row

Is partition key of row different to current key Write out

then clear sorter

Set current key to

partition key of row

Add sort key & data to

sorter

Write out then clear

sorter

Page 44: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

SortingImplementati

on

Pre Execute

public override Process_Input0(Input0Buffer buffer){ while (buffer.NextRow()) { ProcessRow(buffer); } if (buffer.EndOfRowSet()) { WriteoutSorter(); }}

private readonly SortedDictionary<SortKey, IEnumerable<SortData>> sorter = new SortedDictionary< SortKey,IEnumerable<SortData>>();

private PartitionKey currentPartitionKey = null;

Page 45: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Aggregation

Pre Execute

Structure identical to sort except

One to one map between key and dataSx

Data is now MUTABLE - Updated as aggregation progresses

Page 46: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Putting it all together

Page 47: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Putting it all together

38.890 sec

(From 01:24.672)

54%

Page 48: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Helping out SQL Server

Page 49: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Helping out SQL Server

Page 50: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Helping out SQL Server

25 HOURS!(250 million rows)

Page 51: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Helping out SQL Server

Read Keys

Page 52: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Helping out SQL Server

Read Keys

Partitioned Left Hash Lookup

Page 53: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Helping out SQL Server

Read Keys

Partitioned Left Hash Lookup

Partitioned Right Hash

Lookup

Page 54: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Helping out SQL Server

Read Keys

Partitioned Left Hash Lookup

Partitioned Right Hash

Lookup

Partitioned Sort(to cluster key for

data)

Read Data(via cluster key)

Merge Join

90 minutes!(250 million rows)

Page 55: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

P

Partitioned Table

Data Source Transform

Page 56: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

Data Source

……………………….

Switch-in Tables

Physical partitions(sort order)

Time based partitioning

Page 57: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

Page 58: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

Destination

Page 59: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

Task

Parallel

Library

.NET4.5

System.Threading.Tasks.

Dataflow

Data Source

Batch Manager

ActionBlock(multi threaded wrapper around a table writer utilising

SqlBulkCopy)

BatchBlocks

Configure:• Size of batches• Number of writer threads

TPL component

Custom code

Page 60: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

SetupAction Block

Page 61: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

Batch ManagerData processor

Page 62: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Bulk Loading Partitioned Tables

Close down

Page 63: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Summary

KISS

eepthortimple

New tools available but one size does not fit all

Utilise partitioning

You have the power

Page 64: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Summary

Developer’s

Guide

ITDEPEND

S !

Page 65: CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Thank you

So, it is possible to move from theAsynchronous

to(wards) theSynchronous

[email protected]

@jm99a


Recommended