Download - Vertica architecture

Vertica

Zvika GutkinDB [email protected]

Agenda• Vertica VS the world

• What is Vertica

• How does it work

• How To Use Vertica … (The Right Way )

• Where It Falls Short

• Drill Down to SQL’s… (Group by & Joins )

Close Your Eyes

Imagine Your System

It Needs To support:

• 1,000,000 concurrent users

• 1,000,000 operations/s

• Micro seconds read & write latency

• Complex analytics queries with seconds

latency

• ACID

Highly Avilable

Scalable

Open Your Eyes

What Do You See ?

Vertica

OracleCouchbase

Cassandra

MongoMySql

Exadata

Vertica VS the WorldVertica Oracle Cassandra Couchbase

Scale Mpp Single Server* Mpp Mpp

Data Model Relational structured

Relational structured

Column store schema-less

Document schema-less

Transaction Model

ACID ACID Eventually consistent

Consistent

Dr Application solution

Stand by read only

Active Active Active Active

Development Sql… Sql… Python,Java,Cql…

Python,Java,Php…

Best for Analytics Generic,OLTP Write intensive key value

Read and write intensive json

documents

CAP CP N/A AP CP

Use Cases

• Real time dashborading (5,000 concurrent

users, heavy writes and simple fetches ).

• Real time complex analytics

• Billing

• Blog Site

Cassandra

Vertica

Oracle

Couchbase

MPP-Columnar DBMS

• 10x –100x performance of classic RDBMS

• Linear Scale

• SQL

• Commodity Hardware

• Built-in fault tolerance

10x –100x performance of classic RDBMS

Column store architecture

• High Compression rates.• Sorted columns.• Objects Segmentation/Replication.

Regular table

Continent Country City Size Size type Population

Asia Israel Tel Aviv 52000 Acres 450000

N.America USA Dallas 385 Sq. miles 1200000

Create Table …..

Rows Vs ColumnContinent

• Asia

• Asia

• Asia

• N.America

• N.America

• N.America

Country

• Israel

• Israel

• Israel

• Usa

• Usa

• usa

Size Type

• Sq. miles

• Sq. miles

• Sq. miles

• Sq miles

• Sq. miles

• Sq. miles

City size

• 52000

• 78000

• 63000

• 385

• 468

• 8700

City Name

• Tel Aviv

• Jerusalem

• Haifa

• Dallas

• New York

• New Jersey

Population

• 450000

• 800000

• 268000

• 1200000

• 8200000

• 8800000

Block1

•Asia•Israel•Sq. miles•Tel Aviv

Block2

•52000•450000•Asia

Block3

•Israel•Sq. miles•Jerusalem

Block4

•78000•800000•N.America

Block 5

•Usa•Dallas•Sq. miles•385

Block 6

•1200000•Asia•Israel

Block 7

•Haifa•Sq. miles•63000

Block 8

•268000•N.America•Usa

Block 9

•New York•Sq. miles•468•8200000

Block1


Block2

•52000•450000•Asia

Block3


Block4

•78000•800000•N.America

Block 5


Block 6


Block 7


Block 8


Block 9


Block1


Block2

•52000•450000•Asia

Block3


Block4

•78000•800000•N.America

Block 5


Block 6


Block 7


Block 8


Block 9


Block1


Block2

•52000•450000•Asia

Block3


Block4

•78000•800000•N.America

Block 5


Block 6


Block 7


Block 8

•268000•America•Usa

Block 9


Block1


Block2

•52000•450000•Asia

Block3


Block4

•78000•800000•N.America

Block 5


Block 6


Block 7


Block 8


Block 9


Continent

•Asia,3N.America,3

RLE Encoding

Country

•Israel,3Usa,3

RLE Encoding

Size Type

•Dunam,3Sq. miles,3

RLE Encoding

City size

•5200078000630003854688700

DeltaVal Encoding

City Name

•Tel AvivJerusalemHaifaDallasNew YorkNew Jersey

RLE Encoding

Population

•450000800000268000120000082000008800000

LZO Encoding

Rows VS Columns

• Conversion Table (~2 billion rows a month)–Oracle •Uncompressed => 418 GB • Compressed (manual) => 147 GB

–Vertica• 21 GB

Saving : 71%

How Does It Work ?

Tuple Mover

ROSAsia,23

N.America,13

Israel,23

Usa,13

Natanya,1Zoran,1…

seattle,1Chicago,1Austin,1…

Asia,2

N.America, 3

Israel,2

Usa,1

Jerusalem,1Tel aviv,1…

Dallas,1New Jersey,1New York,1…

WOS

Tuple Mover Flow

N.America Usa Dallas Sq. miles 385 1200000

Asia Israel Tel Aviv Sq. miles 52000 450000

N.America Usa New York Sq. miles 462 8200000

N.America Usa New Jersey Sq. miles 468 8800000

Asia Israel Jerusalem Sq. miles 78000 800000

Asia,25

N.America,16

Israel,25

Usa,16

Jerusalem,1Natanya,1Tel Aviv,1Zoran,1…Austin,1Chicago,1Dallas,1New Jersey,1New York,1seattle,1…

Projections

• Physical structure of the table (logical)• Stored sorted and compressed • Internal maintenance • At least one (super) projection• Projection Types:– Super projection– Query specific projection– Pre join projection– Buddy projection

Projections

How to build my projections ?

• Use DBD• Choose the right columns (General Vs Specific)• Choose the right sort order • Choose the right encoding • Choose the right column to partition by • Choose the right column to segment by

Rule of thumbs(Don’t tell Tom Kyte)

• Avoid “select * …”• De normalize• Use bulks for DML’s • Use merge join for large joins. • Understand Vertica architecture &

your data

Delete/Update

• Deleted rows are only marked as deleted• Stored in delete vector on disk• Query merge the ROS and Deleted vector to

remove deleted records• Data is removed asynchronously during merge

out

Delete/UpdateStrata issue

Merge OutToo Many ROS

500MB

2GB

4GB

Where It Falls Short …

• Lack of Features • Documentation • Good for specific types of queries

Let’s Dive into Sql Examples

1. Sort Optimization2. Join Optimization

Choose the Right sort order Example

select a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from lp_15744040.FACT_VISIT_ROOM a11 group by a11.LP_ACCOUNT_ID;

First projection ….table_name projection_name projection_column_name column_position sort_position

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_SESSION_ID 0 0

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad LP_ACCOUNT_ID 1 1

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_VISITOR_ID 2 2

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_TRUNC 3 3

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ACCOUNT_ID 4 4

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ROOM_ID 5 5

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_ACTUAL 6 6

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_TO_DT_ACTUAL 7 7

FACT_VISIT_ROOM FACT_VISIT_ROOM_bad HOT_LEAD_IND 8 8

Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_bad | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID

Second projection …table_name projection_name projection_column_name column_position sort_position

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 LP_ACCOUNT_ID 0 0

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_SESSION_ID 1 1

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_VISITOR_ID 2 2

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_TRUNC 3 3

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ACCOUNT_ID 4 4

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ROOM_ID 5 5

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_ACTUAL 6 6

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_TO_DT_ACTUAL 7 7

FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 HOT_LEAD_IND 8 8

Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_fix1 | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID

Results …

Elapsed Time First projectionGROUPBY HASH (SORT OUTPUT)

Time: First fetch (7 rows): 264527.916 ms. All rows formatted: 264527.978 ms

Elapsed Time Second projectionGROUPBY PIPELINED


2

Group by Hash Not Sorted

Value Count

111

CBAD

12222

Group By Pipe OperatorSorted

Count( ) =

Join Exampleselect a12.DT_WEEK AS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 join zzz.DIM_DATE_TIME a12 on (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by a12.DT_WEEK, a11.LP_ACCOUNT_ID

Filter : LP_ACCOUNT_ID, VISIT_FROM_DT_TRUNC Group By : DT_WEEK , LP_ACCOUNT_ID Join: VISIT_FROM_DT_TRUNC , DATE_TIME_ID Select : DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID

Full Explain Plan…Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes | | | +-- Outer -> STORAGE ACCESS for a11 [Cost: 421K, Rows: 372M (NO STATISTICS)] (PATH ID: 4) | | | | Projection: zzz.FACT_VISIT_b0 | | | | Materialize: a11.VISIT_FROM_DT_TRUNC | | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes | | | +-- Inner -> STORAGE ACCESS for a12 [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 5) | | | | Projection: zzz.DIM_DATE_TIME_node0004 | | | | Materialize: a12.DATE_TIME_ID, a12.DT_WEEK | | | | Filter: ((a12.DATE_TIME_ID >= '2011-09-01 15:28:00'::timestamp) AND (a12.DATE_TIME_ID <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes

Explain Plan (substract)…Access Path:l +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISlTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes


Solution one - Functionsselect week(a11.VISIT_FROM_DT_TRUNC) AS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by week(a11.VISIT_FROM_DT_TRUNC), a11.LP_ACCOUNT_ID;

Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 127, Rows: 1 (STALE STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: <SVAR>, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 126, Rows: 1 (STALE STATISTICS)] (PATH ID: 2) | | Group By: (date_part('week', a11.VISIT_FROM_DT_TRUNC))::int, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 125, Rows: 1 (STALE STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_b0 Time: First fetch (6 rows): 33453.997 ms. All rows formatted: 33454.154 ms

Saved the Join Time

Solution Two- PreJoin Projection

Pros• Eliminate Join overhead• Maintain By Vertica

Cons• Not Flexible• Cause Overhead on Load• Need Primary/Foreign Key• Maintenance Restrictions

Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 12K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin8_b0.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 11K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID, visit_date_time_prejoin8_b0.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 8K, Rows: 1M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin8_b0

Solution Two- PreJoin Projectionorder by LP_ACCOUNT_ID,VISIT_FROM_DT_TRUNC,DT_WEEK,HOT_LEAD_IND,DATE_TIME_ID,VS_LP_SESSION_ID

Time: First fetch (6 rows): 35312.331 ms. All rows formatted: 35312.421 msSaved the Join Time

Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 542K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin_z6.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 542K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.VS_LP_SESSION_ID, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 501K, Rows: 15M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin_z6 | |

Solution Two- PreJoin ProjectionSorted By DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID

Time: First fetch (6 rows): 3680.853 ms. All rows formatted: 3680.969 msSaved the Join Time and Group by hash Time

Solution Three - Denormalizeselect DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT_Z1 a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by DT_WEEK, a11.LP_ACCOUNT_ID;

Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 2M, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_Z1_superTime: First etch (6 rows): 33885.178 ms. All rows formatted: 33885.253 ms

Saved the Join Time

• Changing the projection sort order

Solution Three - Denormalize

Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 588K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 587K, Rows: 10K] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 531K, Rows: 20M] (PATH ID: 3) | | | Projection: zzz.fact_visit_z1_pipe | | | Materialize: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | Execute on: All Nodes

Time: First fetch (6 rows): 4313.497 ms. All rows formatted: 4313.600 msSaved the Join Time and Group by hash Time

Keep it simple.Keep it sorted.*** Keep it joinless.

Let’s sum it up…

Questions ?

Thank You