+ All Categories
Home > Documents > Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Date post: 26-Dec-2015
Category:
Upload: joshua-wiggins
View: 221 times
Download: 2 times
Share this document with a friend
Popular Tags:
37
Example
Transcript
Page 1: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Example

Page 2: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Bulk Nested-Loop JoinsUsing Buffers: e.g. 22 blocks

Page 3: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.
Page 4: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.
Page 5: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.
Page 6: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.
Page 7: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.
Page 8: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort Merge Joins

Notes From Winter 2015

Page 9: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

VisualizationsSailors

Page 1

Page 2

Page 3

Page 4

Page 10: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

VisualizationsSailors

Record 1Record 2Record 3Record 4Record 5

Page 2

Page 3

Page 4

Page 11: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

VisualizationsSailors

Record 1Record 2Record 3Record 4Record 5

Page 2

Page 3

Page 4

Reserves

Page 12: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Simple Nested Loops JoinKey idea:

Take each record of S and match it with each record of R.

Steps:1. Get tuple of S.2. Iterate through

each tuple in R.

Sailors Reserves

Page 13: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Simple Nested Loops JoinKey idea:

Take each record of S and match it with each record of R.

Steps:1. Get tuple of S.2. Iterate through

each tuple in R.

Sailors Reserves

(name = Bob, sid = 1)

Page 14: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Simple Nested Loops JoinKey idea:

Take each record of S and match it with each record of R.

Steps:1. Get tuple of S.2. Iterate through

each tuple in R.

Sailors Reserves

(name = Bob, sid = 1) (sid = 3, bid = 6)(sid = 1, bid = 4)

(name = Bob, sid = 1, bid = 4)Output:

Page 15: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Simple Nested Loops JoinKey idea:

Take each record of S and match it with each record of R.

Steps:1. Get tuple of S.2. Iterate through

each tuple in R.

Sailors Reserves

(name = Bob, sid = 1) (sid = 3, bid = 6)(sid = 1, bid = 4)

(name = Bob, sid = 1, bid = 4)Output:

(sid = 1, bid = 7)

(name = Bob, sid = 1, bid = 7)

Page 16: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

Page 17: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Sam, sid = 3)(name = Sue, sid = 7)

(name = Jill, sid = 2)(name = Joe, sid = 12)

(name = Sue, sid = 8)

(name = Yue, sid = 4)

Page 18: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

Page 19: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

Page 20: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)Output:

Page 21: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)

Output:

Page 22: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)

Output:

Page 23: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)

Output:

Page 24: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)

Output:

. . .

Page 25: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)

Output:

. . .

Page 26: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)

Output:

. . .

Page 27: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)

Output:

. . .

Page 28: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)

Output:

. . .

Page 29: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)

Output:

. . .

Page 30: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Sort-Merge JoinKey idea:

Sort S and R on join column, then merge them!

Steps:1. Sort S and R.2. “Zip” or merge.I/Os:

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

Page 31: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Optimizing Sort-Merge JoinKey idea:

Internal Sort on both. Perform merge on all runs!

Steps:1. Internal sort S

and R. (Pass 0)2. Merge all runs.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)

(name = Sue, sid = 8)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)

. . .

Page 32: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Optimizing Sort-Merge JoinKey idea:

Internal Sort on both. Perform merge on all runs!

Steps:1. Internal sort S

and R. (Pass 0)2. Merge all runs.

Sailors Reserves

(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Yue, sid = 4)(name = Sue, sid = 8)

(name = Jack, sid = 18)

(name = Cat, sid = 22). . .

(name = Sam, sid = 3)(name = Sue, sid = 7)(name = Joe, sid = 12)

. . .

(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 4, bid = 3)(sid = 8, bid = 1)

(sid = 8, bid = 13)

(sid = 12, bid = 1). . .

(sid = 3, bid = 6)(sid = 8, bid = 15)

. . .

Page 33: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

External Sorting

• Classic interview question: how to sort if data don’t fit in memory.

Page 34: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

External Sorting: 2-Way Merge Sort

Page 35: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

What is a sorted run?(name = Bob, sid = 1)(name = Jill, sid = 2)

(name = Sam, sid = 3)

(name = Sue, sid = 6)(name = Kev, sid = 8)(name = Jack, sid = 9)

(name = Joe, sid = 10)(name = Sid, sid = 12)(name = Sal, sid = 15)

(name = Bit, sid = 1)(name = Bat, sid = 2)(name = Tam, sid = 3)

(name = Foo, sid = 6)(name = Bar, sid = 8)

(name = Bam, sid = 9)

(name = Ke, sid = 10)(name = Kay, sid = 12)(name = Al, sid = 15)

A sorted subset of a table.Above we have pages with tuple size = 3There are two sorted runs both with a length of 3 pages. (We denote the size of a sorted run by how many pages the sorted run spans)

Page 36: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

2 Way Merge Sort

• Given N pages of tuples to sort• B = 3 case• Pass 0: Quicksort a single page• 2-way merge sort: how many passes?– 1+ceil(log_2(N)) – 1 for pass 0, ceil(log_2(N)) because pass 0 has N runs

and at each level we merge 2 runs together– ceil of this because the number of runs in pass 0 might

not be a a power of two, but we still need to keep merging until we're down to 1 run,

Page 37: Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.

Generalized Sort Merge


Recommended