GPU-Accelerated Analytics on your Data Lake.
Data Lake
@blazingdb
Data Swamp
@blazingdb
ETL Hell
@blazingdb
DATA LAKE0001010100001001011010110
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>
0101010100100101010101100001
0101101010010001011010100001
01010110100001
0101010100100101010101100001
0101101010010001011010100001
01010110100001
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>
>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>
>>>>>
>>>>>
>>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
COMMON
@blazingdb
DATALAYER
Simplify Data Storage
@blazingdb
SCHEMA
METADATA
DATA
SQL Warehouse on Data Lake
@blazingdb
BlazingDB – How it works
@blazingdb
• Compression/Decompression
• Filtering (Predicate Pushdown)
• Aggregations
• Transformations
• Joins
• Sorting/OrderingDATA LAKE0001010100001001011010110
• RAM Cache (Hot)
• Disk Cache (Medium)
• HDD
• SSDLocal DiskHDFS
AWS S3
BlazingDB Multi-nodal Cluster
@blazingdb
Shared Data Architecture
@blazingdb
DATA LAKE0001010100001001011010110
The Nays
@blazingdb
No Vendor
Lock-in
No Consistency
Management
No BlazingDB
Specific ETL
No DuplicationNo Ingest
The Yays
@blazingdb
High
Concurrency
Data Sharing
(Across Clusters
And Other Tools)
Multi-Terabyte
Queries
Scalable,
On Demand
Data Warehouse
Incredibly
Fast SQL
@blazingdb
DEMO
@blazingdb
Demo - ArchitectureHDFS on Azure Azure GPU Servers
NC24 V1• 4 Servers
Queries: BlazingDB 4 Node Query times (Lower is better)
@blazingdb
Cold
Medium
(Disk cache only)
Hot
Query 1 Query 2 Query 3 Query 4 Query 5
QUERIES
SE
CO
ND
S
142.1
281.1
380.5
135.5
46
73.6
154.1
251.8
73.8
46.3
72
63.1
14 12.214.9
Query 1
@blazingdb
Query 1
SE
CO
ND
S
Cold Medium(Disk cache only)
Hot
select l_returnflag, l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendeprice) as sum_disc_price,
sum(l_extendeprice*(1-l_discount)) as
sum_base_price,
sum(l_extendeprice*(1-l_discount)*(1+l_tax)) as
sum_charge,
avg(l_quatity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(l_quantity) as count_order
from lineitem
where l_shipdate <= ‘1995-06-01’
group by l_returnflag, l_linestatus
order by l_returnflag, l_linestatus;
1234
5
6789
10111213
Query1
Data Points• 6 billion row table
• Many aggregations/transformations
Query 2
@blazingdb
Query 2
SE
CO
ND
S
Cold Medium(Disk cache only)
Hot
select lineitem.l_orderkey,
sum(lineitem.l_extendedprice*(1-
lineitem.l_discount)) as revenue,
orders.o_orderdate, orders.o_shippriority
from customer
inner join orders on customer.c_custkey =
orders.o_custkey inner join lineitem on
lineitem.l_orderkey = orders.o_orderkey
where
customer.c_mktsegment = 'BUILDING'
and orders.o_orderdate < '1995-03-15'
and lineitem.l_shipdate > '1995-03-15'
group by lineitem.l_orderkey,
orders.o_orderdate, orders.o_shippriority
order by revenue desc,orders.o_orderdate;
1234
5
6789
10111213
Query2
Data Points• Join 6B rows to 1.5B rows to 150M rows
• Many aggregations/transformations
• Order (sorting)
Query 3
@blazingdb
Query 3
SE
CO
ND
S
Cold Medium(Disk cache only)
Hot
select nation.name, sum(lineitem.l_extendedprice *
(1 - lineitem.l_discount)) as revenue
from customer
inner join orders on customer.cust_key =
orders.o_custkey inner join lineitem on
lineitem.l_orderkey = orders.o_orderkey
inner join supplier on lineitem.l_suppkey =
supplier.s_suppkey inner join nation on
supplier.s_nationkey = nation.nation_key
inner join region on nation.region_key =
region.r_regionkey
where supplier.s_nationkey = nation.nation_key
and region.r_name = 'ASIA'
and orders.o_orderdate >= '19940101'
and orders.o_orderdate < '19950101'
group by nation.name order by revenue desc
1234
5
6789
1011121314
Query3
Data Points• Join 6B rows to 1.5B rows to 150M rows (and many
small joins)
• Multiple aggregations/transformations
• Order (sorting)
Query 4
@blazingdb
Query 4
SE
CO
ND
S
Cold Medium(Disk cache only)
Hot
select sum(l_extendedprice) as sum_exprice,
sum(l_discount) as sum_discount
from lineitem
where l_shipdate >= '19940101'
and l_shipdate < '19950101'
and l_discount >= 0.05 and l_discount <= 0.07
and l_quantity < 24
1234
5
6789
1011121314
Query4
Data Points• 6B row table
• Multiple aggregations/transformations
Query 5
@blazingdb
Query 5
SE
CO
ND
S
Cold Medium(Disk cache only)
Hot
select supplier.s_acctbal, supplier.s_suppkey, nation.name,
part.p_partkey, part.p_mfgr, supplier.s_address, supplier.s_phone,
supplier.s_comment
from supplier
inner join partsupp on supplier.s_suppkey = partsupp.ps_suppkey
inner join nation on supplier.s_nationkey = nation.nation_key
inner join region on nation.region_key = region.r_regionkey
inner join part on part.p_partkey = partsupp.ps_partkey
where part.p_size = 15
and part.p_type in ('ECONOMY ANODIZED BRASS', 'ECONOMY BRUSHED BRASS',
'ECONOMY BURNISHED BRASS', 'ECONOMY PLATED BRASS', 'ECONOMY POLISHED
BRASS', 'LARGE ANODIZED BRASS',
LARGE BRUSHED BRASS','LARGE BURNISHED BRASS','LARGE PLATED BRASS',
'LARGE POLISHED BRASS', 'SMALL ANODIZED BRASS', 'SMALL BRUSHED BRASS',
'SMALL BURNISHED BRASS',
SMALL PLATED BRASS', 'SMALL POLISHED BRASS', 'STANDARD ANODIZED
BRASS', 'STANDARD BRUSHED BRASS', 'STANDARD BURNISHED BRASS',
'STANDARD PLATED BRASS', 'STANDARD POLISHED BRASS')
and region.r_name = 'EUROPE'
order by supplier.s_acctbal desc, supplier.s_suppkey, nation.name,
part.p_partkey
Query1
Data Points• Join multiple tables
• Many aggregations/transformations
• String comparisons
@blazingdb
Data Pipeline
GPU Data Frame
Apache Arrow
CommonData Layer
INGEST
STORAGE(Data Lake)
Coming Soon
@blazingdb
Questions?