+ All Categories
Home > News & Politics > AWS Webcast - Achieving consistent high performance with Postgres on Amazon Web Services using EBS...

AWS Webcast - Achieving consistent high performance with Postgres on Amazon Web Services using EBS...

Date post: 16-Apr-2017
Category:
Upload: amazon-web-services
View: 6,423 times
Download: 9 times
Share this document with a friend
33
Mastering PostgreSQL with AWS Jafar Shameem Business Development Manager, Amazon Web Services Miles Ward Senior Manager, Solutions Architecture, Amazon Web Services Jay Edwards CTO, PalominoDB
Transcript

Mastering PostgreSQL with AWS

Jafar Shameem

Business Development Manager,

Amazon Web Services Miles Ward

Senior Manager, Solutions Architecture,

Amazon Web Services Jay Edwards

CTO, PalominoDB

Agenda

• AWS Storage Options and EBS

• EBS Provisioned IOPS

• About Postgres

• Postgres on AWS best practices

• Lessons learned from the OFA campaign

Storage Options on AWS

Block Storage (Elastic Block Store)

Object Storage (S3, Glacier)

Use for: • Access to raw

unformatted block level storage

• Persistent Storage

Use for: • Pictures, videos,

highly durable media storage

• Cold storage for long-term archive

Amazon Elastic Block Store (EBS) Elastic Block Storage: Persistent Storage for EC2

High performance block storage

device

Mount as drives to instances

Persistent and independent of

instance lifecycle

Feature Details

High performance

file system

Mount EBS as drives and format as required

Flexible size Volumes from 1GB to 1TB in size

Secure Private to your instances

Available Replicated within an Availability Zone

Backups Volumes can be snapshotted for point in time restore

Monitoring Detailed metrics captured via Cloud Watch

Standard and Provisioned IOPS Volume Types

Standard Volumes Provisioned IOPS Volumes

Optimized for

Workloads with low or moderate IOPS needs and occasional bursts.

Transactional workloads requiring consistent IOPS.

Volume Attributes

Up to 1 TB, average 100 IOPS per volume. Best effort performance. Can be striped together for larger size and higher IOPS.

Up to 1TB, 4,000 IOPS per volume. Consistent IOPS. Can be striped together for larger size and higher IOPS.

Workloads File server, Log processing, Websites, Analytics, Boot, etc.

Business applications, MongoDB, SQL server, MySQL, Postgres, Oracle, etc.

Provisioned IOPS

Volumes

Introducing

Introducing Provisioned IOPS Volumes

❶ Select a new type of Provisioned IOPS volumes

❸ Specify the number of IOs per second your application needs, up to 4000 PIOPS per volume. The volume will deliver the specified IO per second.

❷ Specify the volume capacity

$ ec2-create-volume --size 500 --availability-zone us-east-1b --type io1 –iops 2000

What are customers running on EBS?

Enterprises

Enterprise workloads

are built on block storage

Oracle, SAP, Microsoft

Applications

Convenient, cost-

effective, reliable file

server

Gaming/Social/ Mobile/Education

Very high performance

and consistent IO

for NoSQL and

relational DBs

Marketing / Analytics

Fast sequential IO

access

PostgreSQL

• Open-source RDBMS

• Rich features

• Extraordinary stability

• Focus on performance

• Full ACID compliance for applications requiring durability and

availability

• Robust GIS functionality

PostgreSQL on

AWS

Best Practices

Concepts

• Master PostgreSQL host o Accepts both reads and writes

o May have many replicas

o Records transferred to replicas using Write-Ahead logging (WAL)

• Secondary PostgreSQL host o Receives WAL records from the master

o Replication can be real-time or delayed

• Hot standby o A secondary host that can receive read queries

Installation

• Start with an Amazon Machine Image (AMI) of your choice

• Launch EC2 instance and attach EBS volume to it

• Install software from ftp.postgresql.org

• Edit EC2 security group to allow ingress for port 5432

• Edit postgres.conf for:

o listen_addresses = ‘*’

• For master-slave configurations:

o Set max_wal_senders > 0

Temporary data / SSD Storage

• You can create a normal tablespace on instance storage with

UNLOGGED tables in it to take advantage of increased performance

available with SSDs

• When you create a new table, query the relfilenode of the new table and

backup the file system identified by the query results into permanent

storage. (Be sure to do this before you put any data in the table).

Architecture – Building Blocks

• Master

• Streaming Replication

Replication Basics

• Records are transferred to the replicas via Write-Ahead Logging (WAL)

• Replication can be real-time through “streaming replication” or delayed

via “WAL archiving

• Replication on PostgreSQL supports two levels of durability:

asynchronous and synchronous. Only one replica can be in

synchronous mode. You may, however, provide an ordered list of

candidate synchronous replicas if the primary replica is down.

• Since version 9.2, PostgreSQL has supported Cascading Replication

Architecture – Production Designs

• Functional Partitioning

• Vertically scale to largest EC2 instance and storage

• Tune for the available hardware

• Use replication to create multiple replicas if bound by reads

• Shard your data-sets if bound by writes

Architecture – Anti-patterns

• Vertical Scaling does not offer all the benefits of horizontal scaling

• Scaling step-by-step when you know you need a big system is not

efficient

• ACID compliance has a cost. Consider NoSQL data stores for logs or

session data

• Might not need to do everything in the DB.

Performance – Minimum production scale

• Always use Elastic Block Store

(EBS)

o Significant write cache

o Superior random IO performance

o Enhanced durability compared to

instance stores

Performance – Larger production scale

• Move up to higher bandwidth

instance types (m1.xlarge,

c1.xlarge, m2.4xlarge)

• Increase EBS volume size to >

300 GB

• Increase number of volumes in

RAID set

Performance – Extra-large scale

• Leverage Cluster Compute

instance types

o More bandwidth to EBS

o Ex. CC2 will make

excellent primary nodes,

particularly when paired

with a large number of

EBS volumes (= 8)

• Improve RAID configuration

with:

o effective_io_concurrency

= # of stripes in RAID set

Performance – Extra-large production

scale

• Can also leverage SSD

instance type (hi1.4xlarge)

o 2 x 1 TB SSD storage

(ephemeral storage)

o Perfect for replicas

• If replicas on SSD instance

types, disable integrity

features such as fsync and

full_page_writes on those

hosts to improve

performance

Benchmarking storage

• Sequential test example:

o dd if=/dev/zero of=<location in the disk> bs=8192 count=10000

oflag=direct

• Seek test example:

o sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-

mode=rndrw prepare

o sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-

mode=rndrw --file-fsync-all run

o sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-

mode=rndrw cleanup

o For more aggressive tests, add --file-sync-all option, especially if

comparing different filesystems (ex. ext4 vs XFS)

Benchmarking storage through

PostgreSQL

• Use pgbench

• Install the set with the respective scale:

o pgbench -i -s1000 -Upostgres database

• Run a simple test with 20 clients with 100 transactions each against the

master

o pgbench -c 20 -t 100 -Upostgres database

• Run a “only-read/no vacuum” test against the slave:

o pgbench -S -n -c 20 -t 1000 -h slave -Upostgres database

• If planning on using pgpool, test against it instead of DB

Backups using EC2 snapshots

• Snapshot mounted volume:

o SELECT pg_start_backup(‘label’,true);

o ec2-create-snapshot -d "postgres clon" vol-24592c0e

o SELECT pg_stop_backup();

• If operating near maximum I/O capacity, it is

recommended to use a replica for backups

Restores using a EC2 snapshot

• Check available snapshot

o $ ec2-describe-snapshots

• Create EBS volumes from each snapshot used to backup the DB

o $ ec2-create-volume --snapshot snap-219c1308 --availability-zone

eu-west-1c

• Attach volumes to instances

o $ ec2-attach-volume -i i-96ec5edd -d /dev/sdc vol-eb1561c1

• If using RAID set, replace volumes in same order for easiest re-creation

of the RAID volume in the OS

• Mount instance and assign corresponding permissions

Tunables

• Swappiness, vm, kernel tuning

o By default shmmax and shmall have really small values. Those

values are linked to shared_buffers in postgresql.conf, if this value is

higher than the kernel parameters, the PostgreSQL won’t start.

o vm.swappiness is recommended to be setup with a value under 5.

This setting will avoid use swap unless is really necessary.

• File System Tuning

o XFS (nobarrier,noatime,noexec,nodiratime)

o EXT3/4

• You can use ext3 or non journaled file systems for logs.

Tunables

• WAL

o It’s strongly recommend to separate the data from the pg_xlog (WAL) folder.

For the WAL files we recommend strongly XFS filesystem, due to the high

amount of fsync generated.

o checkpoint_segments. The value of this variable will depend strictly on the

amount of data modified on the instance. At the beginning, you can start with

a moderate value and monitor the logs looking for HINTS

o File segments are 16MB each so it will be easy to fill them if you have batch

of processes adding or modifying data. You could easily need more than 30

on a busy server.

o We recommend not using ext3 file system if you plan to have the WALs in

the same directory as the data. fsync calls are handled inefficiently by this file

system.

Tunables

• Memory Tuning

o shared_buffers is the most important and difficult memory variable to

tune up. A fast recommendation could be start with ¼ of your RAM.

• PGTune is a python script that recommends a configuration

according the hardware on your server.

o https://github.com/gregs1104/pgtune/archive/master.zip

Monitoring

• Use CloudWatch service to monitor –

o checkpoint_segments warnings

o Number of connections

o Memory usage and load average

o Slow queries

o Replication lag

http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html

Security

• Disk Encryption

o Filesystem or OS tools

• Row level Encryption

o pgcrypto

• SSL

• Authentication and Network

• And IAM!!

Lessons learned

from OFA campaign

Lessons Learned

• Use the best practices mentioned earlier

• Use Provisioned IOPS

• AWS Enterprise Support is definitely worth the cost

• Inventory Management is underrated – it’s magic!

• Trusted Advisor is much better than it used to be

• AWS Product Lifecycle

o Starts off not so good

o Gets LOTS better

• Hard to keep up to date with every feature of every product

• Slides will be made available here: o http://aws.amazon.com/ebs/webinars/

• Benchmarking Postgres with EBS 4000 IOPS/volume o http://palominodb.com/blog/2013/05/08/benchmarking-postgres-aws-4000-piops-ebs-instances

• Creating consistent EBS snapshots with MySQL and XFS on Ecs o http://alestic.com/2009/09/ec2-consistent-snapshot

• Understanding Amazon EBS Availability and Performance o http://www.slideshare.net/AmazonWebServices/understanding-ebs-availabilityandperformance

• Benchmarking EBS performance: o http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html

Get started on Provisioned IOPS

today! aws.amazon.com/ebs

Questions: e-mail: [email protected]


Recommended