Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com...

transcript

GPU Accelerated End-to-End Analytics

Rodrigo Arambururodrigo@blazingdb.com

@rodaramburu@blazingdb

@blazingdb@blazingdb

GPUs are well known for accelerating the training of machine learning and deep learning models.

Deep Learning(Neural Networks)

MachineLearning

Performance improvements increase at scale.

40x Improvement over CPU.

GPUs are awesome.

Source: NVIDIA CUDA C Programming Guide

GPUs are awesome.

Peak memory bandwidth

Random 8B access

High End CPU(6-channel DDR4)

120 GB/s 6GB/s

NVIDIA Tesla V100 900GB/s 60GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s 16 x 60GB/s

Except they suck… sometimes.

Peak memory bandwidth

Random 8B access

Memory capacity

PCIe Gen 3 bandwidth

High End CPU(6-channel DDR4)

120 GB/s 6GB/s 1TB+ N/A

NVIDIA Tesla V100

900GB/s 60GB/s 32GB 12GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s

16 x 60GB/s 512GB 4 x 12GB/s

But don’t worry, hardware to the rescue...

NVLink/NVSwitch PCIe Gen 4

Gen 3 is ubiquitous, but HPC servers have started integrating Gen 4,

doubling PCIe bandwidth.

Supports 150 GB/s for GPU-to-GPU communication if you properly use

CUDA IPC.

GPU Direct

With RDMA and Infiniband files can skip OS kernel call and load directly to GPU memory over specialized fabric.

Data Center Topology becomes nightmarish...

PCIe Switch PCIe Switch

GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

PCIe Switch

GPU GPU

NVLink + NVSwitch

12GB/s

150GB/s

This is only half!

All the companies in the GPU ecosystem are building the same code.

Open Source Ecosystem

Expertise:· GPU DBMS· GPU Columnar Analytics· Data Lakes

Expertise:· CUDA· Machine Learning· Deep Learning

Expertise:· Python· Data Science· Machine Learning

RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory

● Launched Oct. 2018

● 16,000+ Installs

● 75+ Contributors

In GPU Memory · Zero-copy reads · Columnar

Metadata

Values

Metadata

Values

Metadata

Values

GPU DataFrame

In GPU Memory · GPU Compute Kernels · Pandas-like API · C++ API

CUDA DataFrame (cuDF)

BlazingSQL: The GPU SQL Engine on RAPIDS AIA SQL engine built on RAPIDS AI.

Query enterprise data lakes lightning fast with full interoperability with the RAPIDS AI stack.

Getting Started Demo

In the life of a query.

Worker 2

Worker 3

Worker 4

RAPIDS

cuGRAPH

GPU DataFrame

Worker 1

User SQL Orchestrator

CoordinatorPython Connector(BlazingSQL Context)

Parser + Planner(Apache Calcite)

GPU DecompressionDecompression on GPUs

GDF ICol A

Col BNULLS

Col CNULLS

Metadata

Values

Metadata

Values

Metadata

Values

Source: NVIDIA GTC 2018 - Nikolay Sakharnyhk (NVIDIA) & Felipe Aramburu (BlazingDB)

● Applying Compression to TPC-H (Q4, SF1000)

● Cascading Compression

● 14x Compression(l_orderkey at SF1000)

New PR(Not Merged)

Unified Communication X (UCX)

What is UCX?

UCX is an open-source production grade communication framework for data centric and high-performance

applications.

Data Center Topology w/ UCX

PCIe Switch PCIe Switch

GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

PCIe Switch

GPU GPU

NVLink + NVSwitch

12GB/s

150GB/s

GPU Primitives

IO cuDF (Single-GPU)

● gdf_radixsort_i8()● gdf_transpose()● gdf_inner_join()● gdf_hash()● gdf_sum()● gdf_product()● gdf_max()● gdf_filter()● gdf_group_by_sum()● gdf_group_by_count()● gdf_order_by()

● read_csv()● read_parquet()● gdf_to_csr()

cuDF (Multi-GPU)

● gdf_hash_partition()● scatter()● gather()● slice()

https://github.com/rapidsai/cudf

Distributed Result Sets

Worker 2

Worker 3

Worker 4

IORALcuDF

BlazingSQL Worker 1

GPU DataFrame0

GPU DataFrame1

RALcuDFIO

gdf_token[0]

gdf_token[1]

Worker 2

Worker 3

Worker 4

Dask Worker 1

GPU DataFrame0

GPU DataFrame1

RAPIDS

cuML cuGraph

Zero-Copy IPC

cuDFIO

cuML cuGraph

BlazingSQL + XGBoost Loan Risk DemoTrain a model to assess risk of new mortgage loans based

on Fannie Mae loan performance data

ETL/Feature Engineering XGBoost Training

Mortgage Data4.22M Loans

148M Perf. RecordsCSV Files on HDFS

CLUSTER

+CLUSTER

1 Nodes

16 vCPUs per node

1 Tesla T4 GPU

2560CUDA Cores

16GBVRAM

++4 Nodes

8 vCPUs per node+30GB RAM

RAPIDS + BlazingSQL outperforms traditionalCPU pipelines

Demo Timings (ETL Phase)

0’’ 1000’’ 2000’’ 3000’’

(1 x T4)

3.8GB(4 Nodes)

15.6GB(1 x T4)

15.6GB(4 Nodes)

TIME IN SECONDS

Scale up the data on a DGX-1(4 x V100 GPUs)

BlazingSQL + Graphistry Netflow AnalysisVisually analyze the VAST netflow data set inside Graphistry in order

to quickly detect anomalous events.

ETL VisualizationNetflow Data65M Events

2 Weeks1,440 Devices

BenchmarksNetflow Demo Timings (ETL Only)

Upcoming BlazingSQL Releases

Use the PyBlazing connection to execute SQL queries on GDFs that are loaded by the cuDF API

Integrate FileSystem API, adding the ability to directly query flat files (Apache Parquet & CSV) inside distributed file systems.

SQL queries are fanned out across multiple GPUs and servers.

String support and string operation support.

QueryGDFs

Direct QueryFlat Files

Distributed Scheduler

StringSupport

Physical Plan Optimizer Partition culling for where clauses and joins.

VO.1 VO.2 VO.3 VO.4 VO.5

Get StartedBlazingSQL is quick to get up and running using either

DockerHub or Conda Install:

https://hub.docker.com/r/blazingdb/blazingsql/

https://github.com/BlazingDB/blazingsql-conda-environment

Rodrigo Aramburu rodrigo@blazingdb.com @rodaramburu …...Rodrigo Aramburu rodrigo@blazingdb.com...

Documents