Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com...

Post on 11-Mar-2020

28 views 0 download

transcript

GPU Accelerated End-to-End Analytics

Rodrigo Arambururodrigo@blazingdb.com

@rodaramburu@blazingdb

@blazingdb@blazingdb

GPUs are well known for accelerating the training of machine learning and deep learning models.

Deep Learning(Neural Networks)

MachineLearning

Performance improvements increase at scale.

40x Improvement over CPU.

@blazingdb@blazingdb

GPUs are awesome.

Source: NVIDIA CUDA C Programming Guide

@blazingdb@blazingdb

GPUs are awesome.

Peak memory bandwidth

Random 8B access

High End CPU(6-channel DDR4)

120 GB/s 6GB/s

NVIDIA Tesla V100 900GB/s 60GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s 16 x 60GB/s

@blazingdb@blazingdb

Except they suck… sometimes.

Peak memory bandwidth

Random 8B access

Memory capacity

PCIe Gen 3 bandwidth

High End CPU(6-channel DDR4)

120 GB/s 6GB/s 1TB+ N/A

NVIDIA Tesla V100

900GB/s 60GB/s 32GB 12GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s

16 x 60GB/s 512GB 4 x 12GB/s

@blazingdb@blazingdb

But don’t worry, hardware to the rescue...

NVLink/NVSwitch PCIe Gen 4

Gen 3 is ubiquitous, but HPC servers have started integrating Gen 4,

doubling PCIe bandwidth.

Supports 150 GB/s for GPU-to-GPU communication if you properly use

CUDA IPC.

GPU Direct

With RDMA and Infiniband files can skip OS kernel call and load directly to GPU memory over specialized fabric.

@blazingdb@blazingdb

Data Center Topology becomes nightmarish...

CPU

PCIe Switch PCIe Switch

PCIe Switch PCIe Switch

GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

NIC

PCIe Switch

GPU GPU

NIC

NVLink + NVSwitch

12GB/s

150GB/s

This is only half!

@blazingdb@blazingdb

All the companies in the GPU ecosystem are building the same code.

@blazingdb@blazingdb

Open Source Ecosystem

Expertise:· GPU DBMS· GPU Columnar Analytics· Data Lakes

Expertise:· CUDA· Machine Learning· Deep Learning

Expertise:· Python· Data Science· Machine Learning

@blazingdb@blazingdb

RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory

● Launched Oct. 2018

● 16,000+ Installs

● 75+ Contributors

@blazingdb@blazingdb

RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory · Zero-copy reads · Columnar

GDF

Col A

NULLS

Col B

NULLS

Col C

NULLS

Metadata

Values

Metadata

Values

Metadata

Values

GPU DataFrame

@blazingdb@blazingdb

RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory · GPU Compute Kernels · Pandas-like API · C++ API

cuDFData Preparation

CUDA DataFrame (cuDF)

@blazingdb@blazingdb

BlazingSQL: The GPU SQL Engine on RAPIDS AIA SQL engine built on RAPIDS AI.

Query enterprise data lakes lightning fast with full interoperability with the RAPIDS AI stack.

@blazingdb@blazingdb

Getting Started Demo

@blazingdb@blazingdb

In the life of a query.

Worker 2

Worker 3

Worker 4

RAPIDS

cuML

cuGRAPH

cuDNN

IORAL

GPU DataFrame

cuDF

Worker 1

User SQL Orchestrator

CoordinatorPython Connector(BlazingSQL Context)

Parser + Planner(Apache Calcite)

IO

@blazingdb@blazingdb

GPU DecompressionDecompression on GPUs

Dict

Dict

RLE

GDF ICol A

NULLS

Col BNULLS

Col CNULLS

Metadata

Values

Metadata

Values

Metadata

Values

Dict

Source: NVIDIA GTC 2018 - Nikolay Sakharnyhk (NVIDIA) & Felipe Aramburu (BlazingDB)

● Applying Compression to TPC-H (Q4, SF1000)

● Cascading Compression

● 14x Compression(l_orderkey at SF1000)

New PR(Not Merged)

@blazingdb@blazingdb

Unified Communication X (UCX)

What is UCX?

UCX is an open-source production grade communication framework for data centric and high-performance

applications.

@blazingdb@blazingdb

Data Center Topology w/ UCX

CPU

PCIe Switch PCIe Switch

PCIe Switch PCIe Switch

GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

NIC

PCIe Switch

GPU GPU

NIC

NVLink + NVSwitch

12GB/s

150GB/s

@blazingdb@blazingdb

GPU Primitives

IO cuDF (Single-GPU)

● gdf_radixsort_i8()● gdf_transpose()● gdf_inner_join()● gdf_hash()● gdf_sum()● gdf_product()● gdf_max()● gdf_filter()● gdf_group_by_sum()● gdf_group_by_count()● gdf_order_by()

● read_csv()● read_parquet()● gdf_to_csr()

cuDF (Multi-GPU)

● gdf_hash_partition()● scatter()● gather()● slice()

https://github.com/rapidsai/cudf

@blazingdb@blazingdb

Distributed Result Sets

Worker 2

Worker 3

Worker 4

IORALcuDF

BlazingSQL Worker 1

IO

GPU DataFrame0

GPU DataFrame1

RALcuDFIO

gdf_token[0]

gdf_token[1]

Worker 2

Worker 3

Worker 4

IO

cuDF

Dask Worker 1

IO

GPU DataFrame0

GPU DataFrame1

RAPIDS

cuML cuGraph

Zero-Copy IPC

cuDFIO

cuML cuGraph

@blazingdb@blazingdb

BlazingSQL + XGBoost Loan Risk DemoTrain a model to assess risk of new mortgage loans based

on Fannie Mae loan performance data

ETL/Feature Engineering XGBoost Training

Mortgage Data4.22M Loans

148M Perf. RecordsCSV Files on HDFS

CLUSTER

+CLUSTER

1 Nodes

16 vCPUs per node

1 Tesla T4 GPU

2560CUDA Cores

16GBVRAM

++4 Nodes

8 vCPUs per node+30GB RAM

@blazingdb@blazingdb

RAPIDS + BlazingSQL outperforms traditionalCPU pipelines

Demo Timings (ETL Phase)

3.8GB

0’’ 1000’’ 2000’’ 3000’’

(1 x T4)

3.8GB(4 Nodes)

15.6GB(1 x T4)

15.6GB(4 Nodes)

TIME IN SECONDS

@blazingdb@blazingdb

Scale up the data on a DGX-1(4 x V100 GPUs)

@blazingdb@blazingdb

BlazingSQL + Graphistry Netflow AnalysisVisually analyze the VAST netflow data set inside Graphistry in order

to quickly detect anomalous events.

ETL VisualizationNetflow Data65M Events

2 Weeks1,440 Devices

@blazingdb@blazingdb

BenchmarksNetflow Demo Timings (ETL Only)

@blazingdb@blazingdb

Upcoming BlazingSQL Releases

Use the PyBlazing connection to execute SQL queries on GDFs that are loaded by the cuDF API

Integrate FileSystem API, adding the ability to directly query flat files (Apache Parquet & CSV) inside distributed file systems.

SQL queries are fanned out across multiple GPUs and servers.

String support and string operation support.

QueryGDFs

Direct QueryFlat Files

Distributed Scheduler

StringSupport

Physical Plan Optimizer Partition culling for where clauses and joins.

VO.1 VO.2 VO.3 VO.4 VO.5

@blazingdb@blazingdb

Get StartedBlazingSQL is quick to get up and running using either

DockerHub or Conda Install:

https://hub.docker.com/r/blazingdb/blazingsql/

https://github.com/BlazingDB/blazingsql-conda-environment