Rodrigo Aramburu [email protected] @rodaramburu …...Rodrigo Aramburu [email protected]...

GPU Accelerated End-to-End Analytics

Rodrigo [email protected]

@rodaramburu@blazingdb

@blazingdb@blazingdb

GPUs are well known for accelerating the training of machine learning and deep learning models.

Deep Learning(Neural Networks)

MachineLearning

Performance improvements increase at scale.

40x Improvement over CPU.


GPUs are awesome.

Source: NVIDIA CUDA C Programming Guide


GPUs are awesome.

Peak memory bandwidth

Random 8B access

High End CPU(6-channel DDR4)

120 GB/s 6GB/s

NVIDIA Tesla V100 900GB/s 60GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s 16 x 60GB/s


Except they suck… sometimes.

Peak memory bandwidth

Random 8B access

Memory capacity

PCIe Gen 3 bandwidth

High End CPU(6-channel DDR4)

120 GB/s 6GB/s 1TB+ N/A

NVIDIA Tesla V100

900GB/s 60GB/s 32GB 12GB/s

NVIDIA DGX-2(16 x V100)

16 x 900GB/s

16 x 60GB/s 512GB 4 x 12GB/s


But don’t worry, hardware to the rescue...

NVLink/NVSwitch PCIe Gen 4

Gen 3 is ubiquitous, but HPC servers have started integrating Gen 4,

doubling PCIe bandwidth.

Supports 150 GB/s for GPU-to-GPU communication if you properly use

CUDA IPC.

GPU Direct

With RDMA and Infiniband files can skip OS kernel call and load directly to GPU memory over specialized fabric.


Data Center Topology becomes nightmarish...

CPU

PCIe Switch PCIe Switch


GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

NIC

PCIe Switch

GPU GPU

NIC

NVLink + NVSwitch

12GB/s

150GB/s

This is only half!


All the companies in the GPU ecosystem are building the same code.


Open Source Ecosystem

Expertise:· GPU DBMS· GPU Columnar Analytics· Data Lakes

Expertise:· CUDA· Machine Learning· Deep Learning

Expertise:· Python· Data Science· Machine Learning


RAPIDS, the end-to-end GPU analytics ecosystem

cuDFData Preparation

cuMLMachine Learning

cuGRAPHGraph Analytics

Model TrainingData Preparation Visualization

A set of open source libraries for GPU accelerating data preparation and machine learning.

In GPU Memory

● Launched Oct. 2018

● 16,000+ Installs

● 75+ Contributors








In GPU Memory · Zero-copy reads · Columnar

GDF

Col A

NULLS

Col B

NULLS

Col C

NULLS

Metadata

Values

Metadata

Values

Metadata

Values

GPU DataFrame








In GPU Memory · GPU Compute Kernels · Pandas-like API · C++ API


CUDA DataFrame (cuDF)


BlazingSQL: The GPU SQL Engine on RAPIDS AIA SQL engine built on RAPIDS AI.

Query enterprise data lakes lightning fast with full interoperability with the RAPIDS AI stack.


Getting Started Demo


In the life of a query.

Worker 2

Worker 3

Worker 4

RAPIDS

cuML

cuGRAPH

cuDNN

IORAL

GPU DataFrame

cuDF

Worker 1

User SQL Orchestrator

CoordinatorPython Connector(BlazingSQL Context)

Parser + Planner(Apache Calcite)

IO


GPU DecompressionDecompression on GPUs

Dict

Dict

RLE

GDF ICol A

NULLS

Col BNULLS

Col CNULLS

Metadata

Values

Metadata

Values

Metadata

Values

Dict

Source: NVIDIA GTC 2018 - Nikolay Sakharnyhk (NVIDIA) & Felipe Aramburu (BlazingDB)

● Applying Compression to TPC-H (Q4, SF1000)

● Cascading Compression

● 14x Compression(l_orderkey at SF1000)

New PR(Not Merged)


Unified Communication X (UCX)

What is UCX?

UCX is an open-source production grade communication framework for data centric and high-performance

applications.


Data Center Topology w/ UCX

CPU



GPU GPU GPU GPU

NIC NIC

NVMe NVMe

PCIe Switch

GPU GPU

NIC

PCIe Switch

GPU GPU

NIC

NVLink + NVSwitch

12GB/s

150GB/s


GPU Primitives

IO cuDF (Single-GPU)

● gdf_radixsort_i8()● gdf_transpose()● gdf_inner_join()● gdf_hash()● gdf_sum()● gdf_product()● gdf_max()● gdf_filter()● gdf_group_by_sum()● gdf_group_by_count()● gdf_order_by()

● read_csv()● read_parquet()● gdf_to_csr()

cuDF (Multi-GPU)

● gdf_hash_partition()● scatter()● gather()● slice()

https://github.com/rapidsai/cudf

https://github.com/rapidsai/cudf


Distributed Result Sets

Worker 2

Worker 3

Worker 4

IORALcuDF

BlazingSQL Worker 1

IO

GPU DataFrame0

GPU DataFrame1

RALcuDFIO

gdf_token[0]

gdf_token[1]

Worker 2

Worker 3

Worker 4

IO

cuDF

Dask Worker 1

IO

GPU DataFrame0

GPU DataFrame1

RAPIDS

cuML cuGraph

Zero-Copy IPC

cuDFIO

cuML cuGraph


BlazingSQL + XGBoost Loan Risk DemoTrain a model to assess risk of new mortgage loans based

on Fannie Mae loan performance data

ETL/Feature Engineering XGBoost Training

Mortgage Data4.22M Loans

148M Perf. RecordsCSV Files on HDFS

CLUSTER

+CLUSTER

1 Nodes

16 vCPUs per node

1 Tesla T4 GPU

2560CUDA Cores

16GBVRAM

++4 Nodes

8 vCPUs per node+30GB RAM


RAPIDS + BlazingSQL outperforms traditionalCPU pipelines

Demo Timings (ETL Phase)

3.8GB

0’’ 1000’’ 2000’’ 3000’’

(1 x T4)

3.8GB(4 Nodes)

15.6GB(1 x T4)

15.6GB(4 Nodes)

TIME IN SECONDS


Scale up the data on a DGX-1(4 x V100 GPUs)


BlazingSQL + Graphistry Netflow AnalysisVisually analyze the VAST netflow data set inside Graphistry in order

to quickly detect anomalous events.

ETL VisualizationNetflow Data65M Events

2 Weeks1,440 Devices


BenchmarksNetflow Demo Timings (ETL Only)


Upcoming BlazingSQL Releases

Use the PyBlazing connection to execute SQL queries on GDFs that are loaded by the cuDF API

Integrate FileSystem API, adding the ability to directly query flat files (Apache Parquet & CSV) inside distributed file systems.

SQL queries are fanned out across multiple GPUs and servers.

String support and string operation support.

QueryGDFs

Direct QueryFlat Files

Distributed Scheduler

StringSupport

Physical Plan Optimizer Partition culling for where clauses and joins.

VO.1 VO.2 VO.3 VO.4 VO.5


Get StartedBlazingSQL is quick to get up and running using either

DockerHub or Conda Install:

https://hub.docker.com/r/blazingdb/blazingsql/

https://github.com/BlazingDB/blazingsql-conda-environment






Date post:	11-Mar-2020
Category:	Documents
Upload:	others
View:	28 times
Download:	0 times

Rodrigo Aramburu [email protected] @rodaramburu …...Rodrigo Aramburu [email protected]...

Documents