+ All Categories
Home > Documents > Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become...

Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become...

Date post: 11-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
40
Implementing Database Operations Using SIMD Instructions By: Jingren Zhou, Kenneth A. Ross Presented by: Ioan Stefanovici CSC2531: Advanced Topics in Database Systems, Fall2011
Transcript
Page 1: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Implementing Database Operations

Using SIMD Instructions

By: Jingren Zhou, Kenneth A. Ross

Presented by: Ioan Stefanovici

CSC2531: Advanced Topics in Database Systems, Fall2011

Page 2: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

The Problem

Databases have become bottlenecked on CPU and

memory performance

Need to fully utilize available architectures’

features to maximize performance

Cache performance

e.g.: cache-conscious B+ trees, PAX, etc.

Proposal: use SIMD instructions

Page 3: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Single-Instruction, Multiple-Data (SIMD)

X0 X1 X2 X3

Y0 Y1 Y2 Y3

X0 OP Y0 X1 OP Y1 X2 OP Y2 X3 OP Y3

OP OP OP OP

Page 4: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Single-Instruction, Multiple-Data (SIMD)

X0 X1 X2 X3

Y0 Y1 Y2 Y3

X0 OP Y0 X1 OP Y1 X2 OP Y2 X3 OP Y3

OP OP OP OPSame

Operation

Let S = #operands (degree of parallelism)

Page 5: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Single-Instruction, Multiple-Data (SIMD)

Focus

Goal

Achieve speed-ups close to (or higher!) than S (the degree of parallelization)

Page 6: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Outline

Motivation & Problem Statement

SIMD Instructions and Implementation Details

Algorithm Improvements:

Scan algorithms

Index traversals

Join algorithms

Page 7: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

A few points...

Compiler auto-parallelization is difficult

Explicit use of SIMD instructions

SIMD data alignment

Column-oriented storage

Targets

Scan-like operations

Index traversals

Join algorithms

Page 8: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Comparison Result Example

Want to perform: X < Y

0x00000001 0x00000003 0x00000004 0x00000007

0x00000002 0x00000003 0x00000005 0x00000006

0xFFFFFFFF 0x00000000 0xFFFFFFFF 0x00000000

< < < <

X

Y

Page 9: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Comparison Result Example

Want to perform: X < Y

0x00000001 0x00000003 0x00000004 0x00000007

0x00000002 0x00000003 0x00000005 0x00000006

0xFFFFFFFF 0x00000000 0xFFFFFFFF 0x00000000

< < < <

X

Y

1 0 1 0

SIMD_bit_vector

Page 10: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Scan

Typical scan:for i = 1 to N{

if (condition(x[i])) then process1(y[i]);

else process2(y[i]);

}

y (data)x (condition)

...... ...x1

x2

x3

x4

x5

x6

y1

y2

y3

y4

y5

y6

Page 11: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

SIMD Scan

Typical SIMD scan:

for i = 1 to N step S {Mask[1..S] = SIMD_condition(x[i..i+S-1]);SIMD_Process(Mask[1..S], y[i..i+S-1]);

}

x1

x2

x3

x4

x5

x6

y1

y2

y3

y4

y5

y6

y (data)x (condition)

...... ...

For S=4

Page 12: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Scan: Return First Match

SIMD Return First Match

SIMD_Process(mask[1..S], y[1..S]){V = SIMD_bit_vector(mask);/* V = number between 0 and 2^S-1 */if (V != 0){

for j = 1 to Sif ( (V >> (S-j)) & 1 ) /* jth bit */

{ result = y[j]; return; }}}

Page 13: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Scan: Return All Matches

SIMD All Matches Alternative 1

SIMD All Matches Alternative 2

SIMD_Process(mask[1..S], y[1..S]){V = SIMD_bit_vector(mask);/* V = number between 0 and 2^S-1 */if (V != 0){

for j = 1 to Sif ( (V >> (S-j)) & 1 ) /* jth bit */

{ result[pos++] = y[j]; }}

SIMD_Process(mask[1..S], y[1..S]){V = SIMD_bit_vector(mask);/* V = number between 0 and 2^S-1 */if (V != 0){

for j = 1 to Stmp = (V >> (S-j)) & 1 /* jth bit */result[pos] = y[j];pos += tmp; } }

}

Page 14: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Scan: Return All Matches Performance

Page 15: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Index Structures (B+ trees)

(Source: Wikipedia)

Log2 (n)

Height

Example of a B+ -tree internal node

Page 16: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

5 Ways to Search

Binary Search (SISD)

SIMD Binary Search

SIMD Sequential Search 1

SIMD Sequential Search 2

Hybrid Search

Page 17: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

Naive SIMD Binary Search (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

Page 18: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

Naive SIMD Binary Search (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

0 0 0 0

Page 19: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

Naive SIMD Binary Search (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

0 0 0 0

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

0 1 0 0 Got it!

Page 20: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 1 (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

Page 21: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 1 (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

1 1 1 0

≤ 4

Total ≤ 4:

3

Page 22: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 1 (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

1 1 1 0

≤ 4

Total ≤ 4:

3

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

0 0 0 0

≤ 4

Total ≤ 4:

3

Page 23: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 1 (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

0 0 0 0

≤ 4

Total ≤ 4:

3

Page 24: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 1 (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

0 0 0 0

≤ 4

Total ≤ 4:

3

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

0 0 0 0

≤ 4

Total ≤ 4:

3 Got it!

Page 25: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 2 (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

Page 26: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 2 (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

1 1 1 0

≤ 4

Total ≤ 4:

3 Is there a key > the search key in the SIMD unit?Yes! Got it!

Page 27: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

SIMD Sequential Search 2 (looking for “4”)

Pro: processes fewer keys (50% fewer on average)

Con: extra conditional test

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

1 1 1 0

≤ 4

Total ≤ 4:

3 Is there a key > the search key in the SIMD unit?Yes! Got it!

Page 28: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

Hybrid Search (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

Pick some L (say L = 3)

...

Page 29: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

Hybrid Search (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

Pick some L (say L = 3)

...

Binary Search on last element of each “segment”

Page 30: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search

Hybrid Search (looking for “4”)

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32

Pick some L (say L = 3)

...

Binary Search on last element of each “segment”

1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32 ...

Sequential SIMD scan inside the correct segment

Page 31: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search Performance

Page 32: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Internal Node Search – Branch Misprediction

Page 33: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Nested Loop Join – O(n2)

Nested Loop

2

4

1

16

9

3

18

2

34

80

5

4

80

8

9

7

10

Outer Loop Inner Loop

Page 34: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Nested Loop Join – O(n2)

SISD Algorithm

2

4

1

16

9

3

18

2

34

80

5

4

80

8

9

7

10

Outer Loop Inner Loop

Iterate 1

at a time

Iterate 1

at a time

Page 35: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Nested Loop Join – O(n2)

SIMD Duplicate-Outer

2

4

1

16

9

3

18

2

34

80

5

4

80

8

9

7

10

Outer Loop Inner Loop

Fix & duplicate

S timesIterate S

at a time

Page 36: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Nested Loop Join – O(n2)

SIMD Duplicate-Inner

2

4

1

16

9

3

18

2

34

80

5

4

80

8

9

7

10

Outer Loop Inner Loop

Fix & duplicate

S times

Iterate S

at a time

Page 37: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Nested Loop Join – O(n2)

SIMD Rotate-Inner (Rotate & Compare S times)

2

4

1

16

9

3

18

2

34

80

5

4

80

8

9

7

10

Outer Loop Inner Loop

Iterate S

at a timeIterate S

at a time

Page 38: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Nested Loop Join – Performance

QueriesQ1. SELECT ... FROM R, S WHERE R.Key = S.Key (integer)Q2. SELECT ... FROM R, S WHERE R.Key = S.Key (floating-point)Q3. SELECT ... FROM R, S WHERE R.Key < S.Key < 1.01 * R.KeyQ4. SELECT ... FROM R, S WHERE R.Key < S.Key < R.Key + 5

Page 39: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Nested Loop Join Branch Misprediction

Page 40: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’

Conclusion

Thank you!

?Questions


Recommended