+ All Categories
Home > Documents > Rodrigo Aramburu [email protected] @rodaramburu …...Rodrigo Aramburu [email protected]...

Rodrigo Aramburu [email protected] @rodaramburu …...Rodrigo Aramburu [email protected]...

Date post: 11-Mar-2020
Category:
Upload: others
View: 28 times
Download: 0 times
Share this document with a friend
27
GPU Accelerated End-to-End Analytics Rodrigo Aramburu [email protected] @rodaramburu @blazingdb
Transcript
Page 1: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

GPU Accelerated End-to-End Analytics

Rodrigo [email protected]

@rodaramburu@blazingdb

Page 2: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

GPUs are well known for accelerating the training of machine learning and deep learning models.

Deep Learning(Neural Networks)

MachineLearning

Performance improvements increase at scale.

40x Improvement over CPU.

Page 3: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

GPUs are awesome.

Source: NVIDIA CUDA C Programming Guide

Page 4: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

GPUs are awesome.

Peak memory bandwidth

Random 8B access

High End CPU(6-channel DDR4)

120 GB/s 6GB/s

NVIDIA Tesla V100 900GB/s 60GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s 16 x 60GB/s

Page 5: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Except they suck… sometimes.

Peak memory bandwidth

Random 8B access

Memory capacity

PCIe Gen 3 bandwidth

High End CPU(6-channel DDR4)

120 GB/s 6GB/s 1TB+ N/A

NVIDIA Tesla V100

900GB/s 60GB/s 32GB 12GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s

16 x 60GB/s 512GB 4 x 12GB/s

Page 6: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

But don’t worry, hardware to the rescue...

NVLink/NVSwitch PCIe Gen 4

Gen 3 is ubiquitous, but HPC servers have started integrating Gen 4,

doubling PCIe bandwidth.

Supports 150 GB/s for GPU-to-GPU communication if you properly use

CUDA IPC.

GPU Direct

With RDMA and Infiniband files can skip OS kernel call and load directly to GPU memory over specialized fabric.

Page 7: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Data Center Topology becomes nightmarish...

CPU

PCIe Switch PCIe Switch

PCIe Switch PCIe Switch

GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

NIC

PCIe Switch

GPU GPU

NIC

NVLink + NVSwitch

12GB/s

150GB/s

This is only half!

Page 8: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

All the companies in the GPU ecosystem are building the same code.

Page 9: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Open Source Ecosystem

Expertise:· GPU DBMS· GPU Columnar Analytics· Data Lakes

Expertise:· CUDA· Machine Learning· Deep Learning

Expertise:· Python· Data Science· Machine Learning

Page 10: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory

● Launched Oct. 2018

● 16,000+ Installs

● 75+ Contributors

Page 11: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory · Zero-copy reads · Columnar

GDF

Col A

NULLS

Col B

NULLS

Col C

NULLS

Metadata

Values

Metadata

Values

Metadata

Values

GPU DataFrame

Page 12: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory · GPU Compute Kernels · Pandas-like API · C++ API

cuDFData Preparation

CUDA DataFrame (cuDF)

Page 13: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

BlazingSQL: The GPU SQL Engine on RAPIDS AIA SQL engine built on RAPIDS AI.

Query enterprise data lakes lightning fast with full interoperability with the RAPIDS AI stack.

Page 14: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Getting Started Demo

Page 15: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

In the life of a query.

Worker 2

Worker 3

Worker 4

RAPIDS

cuML

cuGRAPH

cuDNN

IORAL

GPU DataFrame

cuDF

Worker 1

User SQL Orchestrator

CoordinatorPython Connector(BlazingSQL Context)

Parser + Planner(Apache Calcite)

IO

Page 16: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

GPU DecompressionDecompression on GPUs

Dict

Dict

RLE

GDF ICol A

NULLS

Col BNULLS

Col CNULLS

Metadata

Values

Metadata

Values

Metadata

Values

Dict

Source: NVIDIA GTC 2018 - Nikolay Sakharnyhk (NVIDIA) & Felipe Aramburu (BlazingDB)

● Applying Compression to TPC-H (Q4, SF1000)

● Cascading Compression

● 14x Compression(l_orderkey at SF1000)

New PR(Not Merged)

Page 17: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Unified Communication X (UCX)

What is UCX?

UCX is an open-source production grade communication framework for data centric and high-performance

applications.

Page 18: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Data Center Topology w/ UCX

CPU

PCIe Switch PCIe Switch

PCIe Switch PCIe Switch

GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

NIC

PCIe Switch

GPU GPU

NIC

NVLink + NVSwitch

12GB/s

150GB/s

Page 19: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

GPU Primitives

IO cuDF (Single-GPU)

● gdf_radixsort_i8()● gdf_transpose()● gdf_inner_join()● gdf_hash()● gdf_sum()● gdf_product()● gdf_max()● gdf_filter()● gdf_group_by_sum()● gdf_group_by_count()● gdf_order_by()

● read_csv()● read_parquet()● gdf_to_csr()

cuDF (Multi-GPU)

● gdf_hash_partition()● scatter()● gather()● slice()

https://github.com/rapidsai/cudf

Page 20: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Distributed Result Sets

Worker 2

Worker 3

Worker 4

IORALcuDF

BlazingSQL Worker 1

IO

GPU DataFrame0

GPU DataFrame1

RALcuDFIO

gdf_token[0]

gdf_token[1]

Worker 2

Worker 3

Worker 4

IO

cuDF

Dask Worker 1

IO

GPU DataFrame0

GPU DataFrame1

RAPIDS

cuML cuGraph

Zero-Copy IPC

cuDFIO

cuML cuGraph

Page 21: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

BlazingSQL + XGBoost Loan Risk DemoTrain a model to assess risk of new mortgage loans based

on Fannie Mae loan performance data

ETL/Feature Engineering XGBoost Training

Mortgage Data4.22M Loans

148M Perf. RecordsCSV Files on HDFS

CLUSTER

+CLUSTER

1 Nodes

16 vCPUs per node

1 Tesla T4 GPU

2560CUDA Cores

16GBVRAM

++4 Nodes

8 vCPUs per node+30GB RAM

Page 22: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

RAPIDS + BlazingSQL outperforms traditionalCPU pipelines

Demo Timings (ETL Phase)

3.8GB

0’’ 1000’’ 2000’’ 3000’’

(1 x T4)

3.8GB(4 Nodes)

15.6GB(1 x T4)

15.6GB(4 Nodes)

TIME IN SECONDS

Page 23: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Scale up the data on a DGX-1(4 x V100 GPUs)

Page 24: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

BlazingSQL + Graphistry Netflow AnalysisVisually analyze the VAST netflow data set inside Graphistry in order

to quickly detect anomalous events.

ETL VisualizationNetflow Data65M Events

2 Weeks1,440 Devices

Page 25: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

BenchmarksNetflow Demo Timings (ETL Only)

Page 26: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Upcoming BlazingSQL Releases

Use the PyBlazing connection to execute SQL queries on GDFs that are loaded by the cuDF API

Integrate FileSystem API, adding the ability to directly query flat files (Apache Parquet & CSV) inside distributed file systems.

SQL queries are fanned out across multiple GPUs and servers.

String support and string operation support.

QueryGDFs

Direct QueryFlat Files

Distributed Scheduler

StringSupport

Physical Plan Optimizer Partition culling for where clauses and joins.

VO.1 VO.2 VO.3 VO.4 VO.5

Page 27: Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu @blazingdb. @blazingdb GPUs are well known for accelerating the training

@blazingdb@blazingdb

Get StartedBlazingSQL is quick to get up and running using either

DockerHub or Conda Install:

https://hub.docker.com/r/blazingdb/blazingsql/

https://github.com/BlazingDB/blazingsql-conda-environment


Recommended