@blazingdb@blazingdb
GPUs are well known for accelerating the training of machine learning and deep learning models.
Deep Learning(Neural Networks)
MachineLearning
Performance improvements increase at scale.
40x Improvement over CPU.
@blazingdb@blazingdb
GPUs are awesome.
Source: NVIDIA CUDA C Programming Guide
@blazingdb@blazingdb
GPUs are awesome.
Peak memory bandwidth
Random 8B access
High End CPU(6-channel DDR4)
120 GB/s 6GB/s
NVIDIA Tesla V100 900GB/s 60GB/s
NVIDIA DGX-2(16 x V100)
16 x 900GB/s 16 x 60GB/s
@blazingdb@blazingdb
Except they suck… sometimes.
Peak memory bandwidth
Random 8B access
Memory capacity
PCIe Gen 3 bandwidth
High End CPU(6-channel DDR4)
120 GB/s 6GB/s 1TB+ N/A
NVIDIA Tesla V100
900GB/s 60GB/s 32GB 12GB/s
NVIDIA DGX-2(16 x V100)
16 x 900GB/s
16 x 60GB/s 512GB 4 x 12GB/s
@blazingdb@blazingdb
But don’t worry, hardware to the rescue...
NVLink/NVSwitch PCIe Gen 4
Gen 3 is ubiquitous, but HPC servers have started integrating Gen 4,
doubling PCIe bandwidth.
Supports 150 GB/s for GPU-to-GPU communication if you properly use
CUDA IPC.
GPU Direct
With RDMA and Infiniband files can skip OS kernel call and load directly to GPU memory over specialized fabric.
@blazingdb@blazingdb
Data Center Topology becomes nightmarish...
CPU
PCIe Switch PCIe Switch
PCIe Switch PCIe Switch
GPU GPU GPU GPU
NIC NIC
NVMe NVMe
PCIe Switch
GPU GPU
NIC
PCIe Switch
GPU GPU
NIC
NVLink + NVSwitch
12GB/s
150GB/s
This is only half!
@blazingdb@blazingdb
All the companies in the GPU ecosystem are building the same code.
@blazingdb@blazingdb
Open Source Ecosystem
Expertise:· GPU DBMS· GPU Columnar Analytics· Data Lakes
Expertise:· CUDA· Machine Learning· Deep Learning
Expertise:· Python· Data Science· Machine Learning
@blazingdb@blazingdb
RAPIDS, the end-to-end GPU analytics ecosystem
cuDFData Preparation
cuMLMachine Learning
cuGRAPHGraph Analytics
Model TrainingData Preparation Visualization
A set of open source libraries for GPU accelerating data preparation and machine learning.
In GPU Memory
● Launched Oct. 2018
● 16,000+ Installs
● 75+ Contributors
@blazingdb@blazingdb
RAPIDS, the end-to-end GPU analytics ecosystem
cuDFData Preparation
cuMLMachine Learning
cuGRAPHGraph Analytics
Model TrainingData Preparation Visualization
A set of open source libraries for GPU accelerating data preparation and machine learning.
In GPU Memory · Zero-copy reads · Columnar
GDF
Col A
NULLS
Col B
NULLS
Col C
NULLS
Metadata
Values
Metadata
Values
Metadata
Values
GPU DataFrame
@blazingdb@blazingdb
RAPIDS, the end-to-end GPU analytics ecosystem
cuDFData Preparation
cuMLMachine Learning
cuGRAPHGraph Analytics
Model TrainingData Preparation Visualization
A set of open source libraries for GPU accelerating data preparation and machine learning.
In GPU Memory · GPU Compute Kernels · Pandas-like API · C++ API
cuDFData Preparation
CUDA DataFrame (cuDF)
@blazingdb@blazingdb
BlazingSQL: The GPU SQL Engine on RAPIDS AIA SQL engine built on RAPIDS AI.
Query enterprise data lakes lightning fast with full interoperability with the RAPIDS AI stack.
@blazingdb@blazingdb
Getting Started Demo
@blazingdb@blazingdb
In the life of a query.
Worker 2
Worker 3
Worker 4
RAPIDS
cuML
cuGRAPH
cuDNN
IORAL
GPU DataFrame
cuDF
Worker 1
User SQL Orchestrator
CoordinatorPython Connector(BlazingSQL Context)
Parser + Planner(Apache Calcite)
IO
@blazingdb@blazingdb
GPU DecompressionDecompression on GPUs
Dict
Dict
RLE
GDF ICol A
NULLS
Col BNULLS
Col CNULLS
Metadata
Values
Metadata
Values
Metadata
Values
Dict
Source: NVIDIA GTC 2018 - Nikolay Sakharnyhk (NVIDIA) & Felipe Aramburu (BlazingDB)
● Applying Compression to TPC-H (Q4, SF1000)
● Cascading Compression
● 14x Compression(l_orderkey at SF1000)
New PR(Not Merged)
@blazingdb@blazingdb
Unified Communication X (UCX)
What is UCX?
UCX is an open-source production grade communication framework for data centric and high-performance
applications.
@blazingdb@blazingdb
Data Center Topology w/ UCX
CPU
PCIe Switch PCIe Switch
PCIe Switch PCIe Switch
GPU GPU GPU GPU
NIC NIC
NVMe NVMe
PCIe Switch
GPU GPU
NIC
PCIe Switch
GPU GPU
NIC
NVLink + NVSwitch
12GB/s
150GB/s
@blazingdb@blazingdb
GPU Primitives
IO cuDF (Single-GPU)
● gdf_radixsort_i8()● gdf_transpose()● gdf_inner_join()● gdf_hash()● gdf_sum()● gdf_product()● gdf_max()● gdf_filter()● gdf_group_by_sum()● gdf_group_by_count()● gdf_order_by()
● read_csv()● read_parquet()● gdf_to_csr()
cuDF (Multi-GPU)
● gdf_hash_partition()● scatter()● gather()● slice()
https://github.com/rapidsai/cudf
@blazingdb@blazingdb
Distributed Result Sets
Worker 2
Worker 3
Worker 4
IORALcuDF
BlazingSQL Worker 1
IO
GPU DataFrame0
GPU DataFrame1
RALcuDFIO
gdf_token[0]
gdf_token[1]
Worker 2
Worker 3
Worker 4
IO
cuDF
Dask Worker 1
IO
GPU DataFrame0
GPU DataFrame1
RAPIDS
cuML cuGraph
Zero-Copy IPC
cuDFIO
cuML cuGraph
@blazingdb@blazingdb
BlazingSQL + XGBoost Loan Risk DemoTrain a model to assess risk of new mortgage loans based
on Fannie Mae loan performance data
ETL/Feature Engineering XGBoost Training
Mortgage Data4.22M Loans
148M Perf. RecordsCSV Files on HDFS
CLUSTER
+CLUSTER
1 Nodes
16 vCPUs per node
1 Tesla T4 GPU
2560CUDA Cores
16GBVRAM
++4 Nodes
8 vCPUs per node+30GB RAM
@blazingdb@blazingdb
RAPIDS + BlazingSQL outperforms traditionalCPU pipelines
Demo Timings (ETL Phase)
3.8GB
0’’ 1000’’ 2000’’ 3000’’
(1 x T4)
3.8GB(4 Nodes)
15.6GB(1 x T4)
15.6GB(4 Nodes)
TIME IN SECONDS
@blazingdb@blazingdb
Scale up the data on a DGX-1(4 x V100 GPUs)
@blazingdb@blazingdb
BlazingSQL + Graphistry Netflow AnalysisVisually analyze the VAST netflow data set inside Graphistry in order
to quickly detect anomalous events.
ETL VisualizationNetflow Data65M Events
2 Weeks1,440 Devices
@blazingdb@blazingdb
BenchmarksNetflow Demo Timings (ETL Only)
@blazingdb@blazingdb
Upcoming BlazingSQL Releases
Use the PyBlazing connection to execute SQL queries on GDFs that are loaded by the cuDF API
Integrate FileSystem API, adding the ability to directly query flat files (Apache Parquet & CSV) inside distributed file systems.
SQL queries are fanned out across multiple GPUs and servers.
String support and string operation support.
QueryGDFs
Direct QueryFlat Files
Distributed Scheduler
StringSupport
Physical Plan Optimizer Partition culling for where clauses and joins.
VO.1 VO.2 VO.3 VO.4 VO.5
@blazingdb@blazingdb
Get StartedBlazingSQL is quick to get up and running using either
DockerHub or Conda Install:
https://hub.docker.com/r/blazingdb/blazingsql/
https://github.com/BlazingDB/blazingsql-conda-environment