+ All Categories
Home > Documents > A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

Date post: 11-Jan-2016
Category:
Upload: hugo-henderson
View: 214 times
Download: 1 times
Share this document with a friend
18
A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu
Transcript
Page 1: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

A Relational Algebra Processor

6.375 Final Project

Ming Liu, Shuotao Xu

Page 2: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

2

Motivation

Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU

DBMS frequently used in analytics and scientific computing, but bottlenecked by: Processor speed, software overhead, latency &

bandwidth Proposal: FPGA Based Relational Algebra

ProcessorHost PC

(DBMS)

FPGARelational Algebra

Processor

Physical Storage

Page 3: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

3

Background|Relational Algebra (RA) Many database queries are fundamentally

decomposable to five basic RA operators Although SQL is capable of much more

Operator Functions

Selection Filter rows based on a Boolean condition

Projection Eliminate selected attributes (columns) of a table; remove duplicated results

Cartesian Product

Combine several tables with unique attributes

Union Combine several tables with the same attributes

Difference Select rows of several tables where the rows do not match

Design dedicated processors on the FPGA for each operator

Page 4: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

4

Project Goal

Design and implement an in-memory relational algebra processor on the FPGA

Explore the types of queries that can benefit from FPGA acceleration

Secondary: Outperform SQLite!

Some assumptions: 32-bit wide table entries Tables fit in memory Max number of columns is 32 Read only

Page 5: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

5

Microarchitecture | Host Software

FPGA

Page 6: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

6

Microarchitecture | Top-Level RAProcessor

Host PC(C++

functions)

RA Processo

rDRAM

PCIe

Host PC(DBMS)

RA Processo

r

Physical

Storage

Page 7: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

7

Microarchitecture | Row Marshaller

Exposes a simple interface for operators to access tables in DRAM

Address translation, burst aggregation, truncation & alignment

Multiplexes requests Table values

sent/received as 32-bit bursts

Page 8: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

8

Microarchitecture | Selection

Filters rows based on predicates (e.g. age < 40)

16 predicate evaluators Internally comparators

A tree of gates to qualify the predicates Max: 4 ORs of 4 ANDs

Page 9: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

9

Microarchitecture | Projection

Select columns of a table Column mask one-hot encoded Do not need to buffer row; operate directly on

data bursts

Page 10: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

10

Microarchitecture | Binary Operators

Cartesian Product, Union, Difference and Deduplication

Nested loop implementation

Page 11: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

11

Microarchitecture|Inter-operator Bypassing

Operators enabled concurrently; data passed between operators No intermediate storage

Conditions: 1. A singly link of unary operators2. Each operator has a single

target output3. No structural hazard

Software reorders and schedules the RA commands Data source/destination encoded

in command

Page 12: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

12

Microarchitecture|Inter-operator Bypassing

Multiple 32-bit wide output FIFOs to other operators

Page 13: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

13

Implementation Evaluation

Timing Maximum Frequency: 55.786MHz Critical Path: Row Marshaller mux

Area Slice Registers: 50% LUTs: 85% BRAM/FIFOs: 47%

Modules Slice Registers LUTs BRAM/FIFOs TOTAL 34649 (50%) 59328 (85%) 71 (47%) Row Mashaller 2804 6627 0 Controller 4570 6277 29 Selection 3137 19633 0 Projection 739 654 0 Cartesian Product

1935 1478 0

Union 1939 1983 0 Difference 1875 1949 0 Deduplication 1822 1970 0

Page 14: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

14

Performance Benchmark | Setup SQLite

Internal SQLite timer to report execution time of the query Thinkpad T430, Core i7-3520M @ 2.90Ghz, 1x8GB DDR3-1600

RA Processor Performance counters: cycles from start to ack of an operator

Table Relational Algebra Query SQL Query1 table100k x 30

SELECT,starLong,tableOut, mass,>,80000,AND,pos_x,>,10, OR,pos_x,<,pos_z, OR,col12,>,col14, AND,col20,<,col21

SELECT * FROM starLongWHERE mass > 80000 AND pos_x > 10 OR pos_x < pos_z OR col12 > col14 AND col20 < col21;

1 table100k x 30

PROJECT,starLong,tableOut,pos_x,col19,col25,col29

SELECT pos_x,col19, col25, col29 FROM starLong;

2 tables1k x 30

UNION,starMed1,starMed2,starUnion SELECT * FROM starMed1 UNION SELECT * FROM starMed2;

2 tables1k x 30

XPROD,starMed1,starMed2,starXprodRENAME,starXprod,0,iOrder0,1,mass0,8,

phi0SELECT,starXprod,starFiltered,

iOrder0,=,iOrder, AND,phi0,>,1,AND,mass0,>,mass

PROJECT,starFiltered,starOut,mass0

SELECT s1.mass FROM starMed1 s1, starMed2 s2WHERE s1.vx > s2.vx AND s1.phi > 1 AND s1.mass > s2.mass;

Page 15: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

15

Performance Benchmark | Results

Select Project Union Difference Xprod Dedup Complex Join0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Query Execution Time

FPGA RA Processor SW SQLite

Query

Tim

e (s

) -

Lo

wer

is

bet

ter

Limitation: Memory Bandwidth: 200MB/s vs 12.8GB/s

Page 16: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

16

Performance Benchmark | Results

1 2 4 8 160

0.02

0.04

0.06

0.08

0.1

0.12

Select (Filter) Execution Time with Varying Number of Predicates

FPGA RA Processor SW SQLite

Number of Predicates

Tim

e (s

) -

Lo

wer

is

bet

ter

Select operator most competitive with SQLite What happens with more predicates?

Page 17: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

17

Improvements

Increasing data burst width 32-bit to 256-bit: potential 8x

speedup Area/critical path increase

Maximizing memory bandwidth Additional row buffers to buffer

data from DDR2 Memory

Larger, faster DRAM; Higher clock speed

Page 18: A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

18

Conclusion & Future Work

Complex filtering operations performs well on the FPGA Better than SQLite with sufficient memory

bandwidth Data intensive operators do not perform well Future opportunities:

An accelerator alongside SQLite Integration with HDD/SSD controller


Recommended