+ All Categories
Home > Documents > PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time...

PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time...

Date post: 25-Jun-2020
Category:
Upload: others
View: 22 times
Download: 1 times
Share this document with a friend
24
Prof. Stefan Keller, IFS / Geometa Lab HSR (Slides © CC-BY) PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017
Transcript
Page 1: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

Prof. Stefan Keller, IFS / Geometa Lab HSR

(Slides © CC-BY)

PostgreSQL as GPU Database

for Real-Time Analytics

Vortrag, Swiss PUG, Zürich, 9. November 2017

Page 2: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

About Scalability

Scale-up

Vertical

Add more HW-components (homo- or heterogeneous)

Expensive(?)

No open source, platform lock-in(?)

Scale-out

Horizontal

Cheap commodity HW as „nodes‟

Flexibly add more nodes

Open source

Need to relax constraints, even ACID (BASE)?

2 Stefan Keller, "PostgreSQL as GPU Database..."

Page 3: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

GPU Databases

Stefan Keller, "PostgreSQL as GPU Database..."

Page 4: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

GPU Databases

More and more GPUs and Memory Bandwith…

Use Cases:

Analytical - not transactional

OLTP + OLAP = Hybrid transactional/analytical processing

(HTAP) => No need to move data to warehouse

Setting:

Single-node => much simpler to maintain

Discrete GPU (rather than FPGA, speciality chips)

CPU vs. GPU:

CPU is suited for low latency, complex data + ops

GPU is suited for troughput of homogeneous ops

4

Page 5: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

GPU Database Reference

Architecture

Master of Science in Engineering (MSE)

Stefan Keller, "PostgreSQL as GPU Database..."

Page 6: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

Paper

Paper by Heime, Siegmund, Bellatreche, Saake (Universities of Magdeburg, Berlin, Passau, Futuroscope/France) on

GPU-accelerated database systems: Survey and open challenges in Transactions on Large-Scale Data-and Knowledge-Centered Systems XV. Springer Berlin Heidelberg, 2014. Pages 1-35. Weblink: http://bit.ly/1rMOuZC (pdf)

Contents:

Design Choices

Evaluation of 8 GDBMS

Reference architecture

Insights for all co-processors

6

Page 7: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

Overview

Exemplary architecture of a system with a graphics card:

7 Stefan Keller, "PostgreSQL as GPU Database..."

Page 8: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

Architecture of GPU-aware DBMSs

Design choices/space of GPU-aware DBMSs

8 Stefan Keller, "PostgreSQL as GPU Database..."

Page 9: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom / PostgreSQL

Stefan Keller, "PostgreSQL as GPU Database..."

Page 10: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PostgreSQL - www.postgresql.org

“The world's most advanced open source database”.

Open source aka BSD/MIT license

PostgreSQL 10 Released October 2017 (since 2002)

Fully ACID compliant object-relational database system

Reputation for reliability, data integrity, and correctness

Broad community

Runs on all major operating systems

Broad support of SQL and data types

Scalable in quantity of data and concurrent users

Extensible: Modules (EXTENSION, Network), Foreign

Data Wrappers (SQL/MED), Language APIs

10 Stefan Keller, "PostgreSQL as GPU Database..."

Page 11: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom

PG-Strom - http://strom.kaigai.gr.jp/ - Version 1.0

“Limit breaker of PostgreSQL”

Extension module to accelerate SQL workloads using multi-thousands cores and high bandwidth memory. Open source GPLv2.

Requirements PostgreSQL 9.5

CUDA

Main use cases

In-database analytics: realt-time statistics

Rapid batch processing: ETL/ELT

Main SW architecture design decisions:

Heterogeneous scale-up

On-the-fly native GPU code generation

Asynchronous pipeline execution mode

11 Stefan Keller, "PostgreSQL as GPU Database..."

Page 12: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: SW architecture

12 Stefan Keller, "PostgreSQL as GPU Database..."

Page 13: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: Overview

13 Stefan Keller, "PostgreSQL as GPU Database..."

Source: http://strom.kaigai.gr.jp/

Page 14: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: Overview ff.

14 Stefan Keller, "PostgreSQL as GPU Database..."

Page 15: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: Features - Data types

Data Types:

Numeric: …; Date/Time: …; Others: bool, money

Text: …

Limits on text and varchar(x)

=> "GPU cannot process compressed or TOAST'ed data"

=> "ALTER TABLE ... SET STORAGE PLAIN" or MAIN

Not supported:

geometry, geography (PostGIS)

See Reference:Data Types for details

Internals:

Custom Scan Provider, see

www.postgresql.org/docs/current/static/custom-scan.html

15 Stefan Keller, "PostgreSQL as GPU Database..."

Page 16: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: Features – SQL workloads

Full Table Scan

with scan qualifiers, GPU runs evaluation of scan qualifier and filter out invisible rows…

Tables Join

Parallel version of hash-join algorithm and simple (none parameterized) nest-loop algorithm are supported…

Group By/Aggregation

GPU runs pre-processing of aggregate operations, to reduce the number of rows to be processed by CPU….

Projection

When SQL query contains complicated mathematical formulas, GPU runs calculation of these expression on the device, then CPU just references the calculated results

16 Stefan Keller, "PostgreSQL as GPU Database..."

Page 17: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: Limits

Latency

0.2-0.3 sec to initialize GPU device

Max. concurrent sessions

up to 3-5

Database size:

10 GB = data in shared buffer of PostgreSQL, or disk cache

of operating system

Tipp: Use pg_prewarm

See http://strom.kaigai.gr.jp/install.html

17 Stefan Keller, "PostgreSQL as GPU Database..."

Page 18: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: Performance

Estimations:

RDBMS + GPU => factor 3

Columnar In-Memory => factor 10

Pure GPU => factor 100

Benchmarks

See next slides

See Seminar 22. January 2018, 14-16h, HSR Rapperswil

18 Stefan Keller, "PostgreSQL as GPU Database..."

Page 19: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

19 Stefan Keller, "PostgreSQL as GPU Database..."

Page 20: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

PG-Strom: Further development

Version 1.x

More concurrent sessions

Data size: SSD collaboration feature at v2.0

PostGIS?

Where is it compared to the Rerefence Architecture?

20 Stefan Keller, "PostgreSQL as GPU Database..."

Page 21: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

GPU Databases - öffentliche

Präsentationen im Seminar

Database Systems der HSR

Master of Science in Engineering (MSE)

Stefan Keller, "PostgreSQL as GPU Database..."

Page 22: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

Seminar

SW:

PG-Storm 1.0 / PostgreSQL 9.5

MapD Open Source Edition

PostgreSQL 10, Tuned

HW:

Commodity Server („Pizzabox“)

IBM Power8 Server („Pizzabox“)

Data, Benchmarks, Docker-Files

See https://wiki.hsr.ch/Datenbanken/wiki.cgi?SeminarDatenbanksystemeHS1718

22

Page 23: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

Seminar

Benchmarks:

Cold start PG-Storm, MapD, PostgreSQL (= 3x)

Warm start PG-Storm, MapD, PostgreSQL (= 3x)

Presentations:

4 students

German spoken, english report

Final (public) presentations:

22. January 2018, 14-16h

HSR Rapperswil, Room 8.125

Registration: http://techup.ch/tag/htap

23 Stefan Keller, "PostgreSQL as GPU Database..."

Page 24: PostgreSQL as GPU Database for Real-Time Analytics€¦ · PostgreSQL as GPU Database for Real-Time Analytics Vortrag, Swiss PUG, Zürich, 9. November 2017 . About Scalability Scale-up

Discussion

Credits

Kohei KaiGai

Stefan Keller

Geometa Lab at Institute for Software

HSR Hochschule für Technik Rapperswil

www.hsr.ch/geometalab

@sfkeller

24 Stefan Keller, "PostgreSQL as GPU Database..."


Recommended