+ All Categories
Home > Documents > Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with...

Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with...

Date post: 21-Jun-2020
Category:
Upload: others
View: 14 times
Download: 0 times
Share this document with a friend
63
Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, Paul Singman
Transcript
Page 1: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Scaling AWS Redshift Concurrency with PostgresBy Elliott Cordo, Will Liu, Paul Singman

Page 2: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Integrated luxury and lifestyle company with offerings centered on movement, nutrition, and regeneration

we operate more than 200 locations within every major city across the country in addition to London and Canada

Page 3: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Analytics Overview

1. Extract data from source systems

2. Transform raw datainto useful metrics

3. Analyze, report, andvisualize

ETL

Data Warehouse Reporting & Analytics AppsThird Party IntegrationsML Modeling & Insights

DB DB

CRM Clickstream

ERP API

Finance FTP

Page 4: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Why A Data Warehouse?

• Prevent data discrepancies

• Lower employee learning curve

• Avoid duplicating logic in multiple systems

• Isolate production DBs from analytic workloads

Page 5: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Presents a single point of failure so we created a data replication failover procedure

Page 6: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Data Ecosystem @ Equinox

Maximilian

1. Extract data from source systems

Page 7: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Data Ecosystem @ Equinox

AWS Redshift

2. Transform raw datainto useful metrics

Maximilian

Page 8: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Redshift Warehouse Structure (J.A.R.V.I.S.)

SQL

Raw Landing Tables

Big piles of SQL

Fact & Dimension Tables

d_facility

d_membership

f_checkin

Smaller piles of SQL

SQLSQLSQL

Data Marts

Member Profile

Group Fitness

Retail

Page 9: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Data Ecosystem @ Equinox

AWS Redshift

2. Transform raw datainto useful metrics

Maximilian

Page 10: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Data Ecosystem @ Equinox

Equinox AppsReporting/DashboardsRecommender SystemsInternal APIsAd-hoc Analysis3rd Party Data Integrations

3. Analyze, report, andvisualize

Maximilian AWS Redshift

Page 11: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Data Ecosystem @ Equinox

Equinox AppsReporting/DashboardsRecommender SystemsInternal APIsAd-hoc Analysis3rd Party Data Integrations

AWS RedshiftMaximilian

Page 12: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Data Ecosystem @ Equinox

Equinox AppsReporting/DashboardsRecommender SystemsInternal APIsAd-hoc analysis3rd party data integrations

AWS RedshiftMaximilian

Page 13: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must
Page 14: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must
Page 15: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

Page 16: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Released by AWS in early 2013, based on Postgres v8.0.2

Page 17: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Column-oriented storage

Page 18: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Immutable 1MB block storage

Page 19: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Massively-parallel processing compute engine

Page 20: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Native multi-node (leader + workers) architecture

Page 21: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Distribution key & sort key table settings

Page 22: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Workload Management queue settings

Page 23: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Key Features of Redshift

• Released by AWS in early 2013, based on Postgres v8.0.2

• Column-oriented storage

• Immutable 1MB block storage

• Massively-parallel processing compute engine

• Native multi-node (leader + workers) architecture

• Distribution key & sort key table settings

• Workload Management queue settings

Page 24: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must
Page 25: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

• Petabyte-scale disk storage

• Batch insertions & retrieval

• Complex computations

Page 26: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

• High-frequency transactions

• Concurrent user connections

• Petabyte-scale disk storage

• Batch insertions & retrieval

• Complex computations

Page 27: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

10x - 100x slower on simple SELECT500 connection limit (per cluster)50 connection limit (per user-defined queue)

• High-frequency transactions

• Concurrent user connections

• Petabyte-scale disk storage

• Batch insertions & retrieval

• Complex computations

Page 28: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Warehouse Consumers

AWS Redshift

Equinox Apps

Recommender System

Internal APIs

Ad-hoc Analytics

Reporting/Dashboards

Page 29: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Warehouse Consumers

AWS Redshift

Reporting/Dashboards *

Page 30: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must
Page 31: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must
Page 32: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Problem

AWS Redshift

Sales Reporting Architecture

Moso

dbo.agreements

dbo.sales

raw_agreements

raw_ sales

~10 min

membership_sales

Page 33: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift Moso

dbo.agreements

dbo.sales

raw_agreements

raw_ sales

~10 min

membership_sales

Sales Reporting Architecture Problem

10x - 100x slower on simple SELECT500 connection limit (per cluster)50 connection limit (per user-defined queue)

Page 34: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Potential Solutions

Page 35: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Potential Solutions

1. Pull from source DB

Page 36: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift

Sales Reporting Architecture

Moso

dbo.agreements

dbo.sales

raw_agreements

raw_ sales

~10 min

membership_sales

Page 37: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Pros

• Source DB is OLTP

Cons

Potential Solutions

1. Pull from source DB

• Sales logic is complex!• Burdens prod DB

Page 38: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Pros

Cons

2. Cache in Redis

Potential Solutions

1. Pull from source DB

• Sales logic is complex!• Burdens prod DB

• Source DB is OLTP

Page 39: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift

Sales Reporting Architecture

raw_agreements

raw_ sales

membership_sales

Page 40: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Pros

Cons

2. Cache in Redis

Pros

• Very fast performance

Cons

Potential Solutions

1. Pull from source DB

• Sales logic is complex!• Burdens prod DB

• Non-relational data structure• Keys must be created for

every data view

• Source DB is OLTP

Page 41: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Pros

Cons

Pros

• Very fast performance

Cons

Potential Solutions

2. Cache in Redis1. Pull from source DB 3. Copy to another DB

• Sales logic is complex!• Burdens prod DB

• Non-relational data structure• Keys must be created for

every data view

• Source DB is OLTP

Page 42: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift

membership_sales

Sales Reporting Architecture

raw_agreements

raw_ sales

Page 43: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

1. Pull from source DB

Pros

Cons• Sales logic is complex!• Burdens prod DB

Pros• Very fast performance

Cons• Non-relational data structure• Keys must be created for

every data view

3. Copy to another DB

Pros

Cons• Additional ETL step

Potential Solutions

2. Cache in Redis

• Maintain relational format• Source DB is OLTP

Page 44: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

1. Pull from source DB

Pros

Cons

Pros• Very fast performance

Cons

3. Copy to another DB

Pros

Cons

Potential Solutions

2. Cache in Redis

• Sales logic is complex!• Burdens prod DB

• Non-relational data structure• Keys must be created for

every data view

• Additional ETL step

• Maintain relational format• Source DB is OLTP

Page 45: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

PostgreSQL Foreign Data Wrapper

Page 46: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

PostgreSQL Foreign Data Wrapper

https://aws.amazon.com/blogs/big-data/join-amazon-redshift-and-amazon-rds-postgresql-with-dblink

Page 47: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift

Proposed Sales Reporting Architecture

raw_agreements

raw_ sales

PostgreSQL

membership_salesmembership_ sales_mv

Page 48: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift

Proposed Sales Reporting Architecture

raw_agreements

raw_ sales

PostgreSQL

membership_salesmembership_ sales_mv

Page 49: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Environment Set-up

✓Create Redshift cluster

✓Create PostgreSQL server (9.5+)

• RDS recommended

• For self-managed, install Postgres contrib package:• sudo yum install postgresql10-contrib.x86_64

✓Networking (AWS)

• Co-locate in same Availability Zone

• Configure Security Group

Page 50: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Creating the Link

--1 enable the required extensionsCREATE EXTENSION postgres_fdw;CREATE EXTENSION dblink;

--2 create the external serverCREATE SERVER jarvis

FOREIGN DATA WRAPPER postgres_fdwOPTIONS (host 'REDSHIFT_ENDPOINT', port '5439’,dbname 'REDSHIFT_DB_NAME', sslmode 'require');

--3 save redshift login to this external serverCREATE USER MAPPING FOR PG_USERNAME

SERVER Jarvis OPTIONS(user 'RS_USERNAME', password 'RS_PASSWORD’);

Page 51: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Running queries on PostgreSQL

SELECT *FROM dblink('jarvis', $REDSHIFT$

SELECTmember_sales_id,member_id,sales_action,sales_action_date

FROMrs_landing.raw_sales $REDSHIFT$)

AS sales_actions (member_sales_id varchar(50),member_id varchar(50),sales_action varchar(50),sales_action_date date

);

Page 52: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Leveraging a Materialized ViewCREATE MATERIALIZED VIEW pg.membership_sales_copy AS(

SELECT *FROM dblink('jarvis', $REDSHIFT$SELECT

member_sales_id,member_id,sales_action,sales_action_date

FROMrs.membership_sales $REDSHIFT$

) AS membership_sales_copy (member_sales_id varchar(50),member_id varchar(50),sales_action varchar(50),sales_action_date date

);

REFRESH materialized VIEW pg.membership_sales_copy;

Page 53: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift

membership_sales

Sales Reporting Architecture

raw_agreements

raw_ sales

dblink

PostgreSQL

membership_ sales_mv

Page 54: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Materialized View Roadblock

AWS Redshift

~10 mins

PostgreSQL

membership_sales_mv membership_sales

MillionsOf records

Page 55: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Change Data Capture for Large Tables

AWS Redshift

dblink

PostgreSQL

membership_sales

membership_sales_cdcmembership_sales_mv

membership_sales

Page 56: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

--Step 1: Create staging table in Redshift with last few hours of sales actions--CREATE TABLE rs_landing.stage_sales_actionDELETE FROM rs.membership_sales_cdc

INSERT INTO rs.membership_sales_cdcSELECT member_sales_id, member_id, sales_action, sales_action_dateFROM rs.membership_salesWHERE date >= ' $[?from_date]';

--Step 2: Refresh materialized view in PostgresREFRESH materialized VIEW pg.membership_sales_mv;

--Step 3: Upsert logic to populate final table in Postgres from materialized view

--temp table to hold last batchDROP TABLE IF EXISTS cdc_sales;CREATE TEMP TABLE cdc_sales ASSELECT * FROM pg.membership_sales_mv;

--update changed records, member_sales_id as the key to identify a unique recordUPDATE pg.membership_sales msSET sa.member_id = s.member_id,

ms.sales_action = s.sales_action,ms.sales_action_date = s.sales_action_date

FROM cdc_sales sWHERE s.member_sales_id = ms.member_sales_id;

--delete the records we just updated from temp tableDELETE FROM cdc_sales s USING pg.membership_sales msWHERE s.member_sales_id = ms.member_sales_id;

--insert new records not found in membership_salesINSERT INTO pg.membership_salesSELECT * FROM cdc_sales;

--drop temp tableDROP TABLE cdc_sales;

Redshift

PostgreSQL

Page 57: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Solution

AWS Redshift

membership_sales_cdc

Sales Reporting Architecture

dblink

PostgreSQL

membership_salesmembership_sales_mvmembership_sales

Page 58: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

AWS Redshift

membership_sales_cdc

Sales Reporting Architecture

dblink

PostgreSQL

membership_salesmembership_sales_mvmembership_sales

Solution

Page 59: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Learnings & Notes

• Minimal maintenance on Postgres instance

• Won’t reflect source deletions

• Limited to a few tables

• Flexible for schema evolution

Page 60: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Looking to the Future…

• Make use of foreign table

• Front-end scaling with read-replicas

• Extensible to other datastores

• Event-based streaming architecture

Page 61: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

More to Explore

Page 62: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

Q&A

Page 63: Scaling AWS Redshift Concurrency with Postgres€¦ · Scaling AWS Redshift Concurrency with Postgres By Elliott Cordo, Will Liu, ... • Non-relational data structure • Keys must

We Are Hiring

Email [email protected]

• Head of Engineering

• React Native Engineer

• Sr. React Native Engineer

• API Engineer

• SDET Java - Architect

• SDET Javascript - Architect

• SDET Java

• SDET Javascript

• Sr. UX Researcher

Equinox Tech Blog http://tech.equinox.com/


Recommended