+ All Categories
Home > Documents > White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source...

White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source...

Date post: 08-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
7
© Copyright 2019, Apps Associates LLC. All rights reserved. White Paper AMAZON AURORA A Fast, Affordable and Powerful RDBMS
Transcript
Page 1: White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly

© Copyright 2019, Apps Associates LLC. All rights reserved.

White Paper

AMAZON AURORAA Fast, Affordable and Powerful RDBMS

Page 2: White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly

© Copyright 2019, Apps Associates LLC. All rights reserved.

www.appsassociates.com

TABLE OF CONTENTS

Introduction 1

Multi-Tenant Logging and Storage Layer with Service-Oriented Architecture 1

High Availability with Self-Healing and Fault Tolerance 2

2

4

The adoption of Log Structured Storage in Aurora

Lock Free Concurrency with MVCC (Multi Version Concurrency Control)

5 Survivable Caches for Faster Instance Crash Recovery

Page 3: White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly

IntroductionAmazon Aurora is a relational database engine that combines the speed and reliability of high-end commercial databases with the

simplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a

highly efficient, scalable, reliable and self-healing relational database. It will also explain how Aurora is faster than MySQL.

Multi-tenant Logging and Storage Layer with Service Oriented ArchitectureAmazon has done a great job in applying service oriented architecture to the database. It has built a storage service similar to that of the

Elastic Block Storage (EBS) service used by EC2 (Elastic Cloud Compute) server instances. AWS has created a service-oriented architecture

for accessing storage from the Aurora Database and has moved the logging and storage layer into a multi-tenant, scale-out database-op-

timized storage service.

Amazon has Integrated Aurora with other AWS services like Amazon EC2, Amazon VPC (Virtual Private Cloud), Amazon DynamoDB, Amazon

SWF, and Amazon Route 53 for control plane operations. Control plane is the part of the service that controls the database itself. Its

capabilities include provisioning of the system and any metadata that Aurora has to keep about the system. The end point information

of the RDS instance is controlled by DNS via Route53.

The storage is integrated with Amazon S3 for continuous incremental backups with 99.999999999% durability

A Fast, Affordable and Powerful RDBMS

SQL

Data Plane Control Plane

AmazonDynamoDB

Amazon SWF

Amazon Route 53Amazon S3

Transactions

Caching

Logging + Storage

VPC

VPC

Page | 1 © Copyright 2019, Apps Associates LLC. All rights reserved.

www.appsassociates.com

Page 4: White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly

High Availability in Aurora is automatically provided without additional cost to the customer. By default, Aurora scales across three

availability zones with two copies in each region, making a total of six copies. To ensure data consistency AWS uses something called the

four quorum model. This requires that at least four of the six copies successfully complete a write command when a change is made to

your data. When reading data, Aurora will consider a read to be consistent if there is agreement from at least three of the six copies. Also,

some really nice logic has been implemented; if an Availability Zone(AZ) goes down Aurora will automatically roll over to the three or four

quorum model which means your write and reads don't completely fail if a single AZ fails. AWS is ensuring that recovery will be fast due

to log structured storage running on SSD’s (solid state drives) that keep latencies very low, reducing disk seek time. The most significant

advantage of Amazon RDS Aurora is that the system does automatic error detection, and repair.

The adoption of Log Structured Storage in AuroraOne of the major breakthrough features of Aurora which makes it unique in the RDBMS space is the log structured storage of data. Log

structure delivers high write throughput and simplifies hot backups, lock free concurrency, recovery and snapshots. This helps Aurora to

speed up file writing operations as well as crash recovery. Log structured storage is a well-researched concept in computer science and

something very familiar for the folks working with NoSQL products like HBASE and Cassandra. With Log Structured Storage, new records

are append to the storage, and existing records are never updated. B-Tree index-

es hold pointers to the latest version of a record. On a periodic basis the stale

records are removed through a garbage collection process.

High Availability with Self-healing and fault tolerant

Read availability Read and write availability

AZ 1 AZ 2 AZ 3SQL

Transactions

Caching

AZ 1 AZ 2 AZ 3SQL

Transactions

Caching

conventional file systems

log-structured file system

Page | 2 © Copyright 2019, Apps Associates LLC. All rights reserved.

www.appsassociates.com

Page 5: White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly

The log structured storage avoids reading before writing. Read-before-write, especially in a large distributed system, can produce bottlenecks in

read performance. During updates the system doesn’t waste time seeking the record on the disk to find where the original value was stored and

write back to that page. Instead, it simply appends it to the end of the file and uses b-tree indexes to hold pointers to the latest version of the

record. So when you write a new block of storage or a new segment of storage it updates the index to point to the most recent version of that

data. In this way there is zero read activity on the database during an update.

Traditional RDBMS use the log only for temporary storage and the permanent home for information is in a traditional random-access storage

structure on disk. In contrast, a log-structured file system stores data permanently in the log: there is no other structure on disk. In the log

structured storage a database actually performs only one write against two writes in a traditional RDBMS. The diagram below compares a

traditional RDBMS recovery, which synchronously replays logs from the last checkpoint with an Aurora recovery where recovery is asynchronous

and in parallel, applying changes only at the segment level.

During the garbage collection process the system cleans the stale records, reclaims the space and merges the latest versions of records into a

smaller number of files.

Recovery from failure in log structured storage based databases happens almost instantaneously because the database file is a write ahead log

and I/O is greatly reduced. When there's a failure event you can just restart the database, update the pointers and you will be up and running.

Survivable Caches for faster instance crash recovery AWS has moved the cache out of the database process. It remains warm in the event of a database restart. This will help to resume fully

loaded operations much faster especially during the instance crash recovery.

Instant crash recovery + survivable cache = quick and easy recovery from DB failures

In Summary, the above features of Aurora built on a MySQL engine with the power of Amazon RDS resulted in making Aurora a fast, affordable and powerful RDBMS.

References:

http://web.stanford.edu/~ouster/cgi-bin/papers/lfs.pdf

AWS Aurora Documentation.

Author ProfilesSatyendra Kumar Pasalapudi is an Associate Practice Director in the Infrastructure Managed Services Team at Apps Associates. He is an Oracle ACE Director and speaker at various conferences like OAUG, IOUG, and Oracle Open World. He has worked with various clients on Infrastructure, Cloud and AWS related engagements. He can be contacted at [email protected]

Data Inode Dir Meta-Data

Disk

Crash at T0 requiresa re-application of the SQL in the redo log sincelast checkpoint

Checkpointed Data Redo Log

Crash at T0 will result in redologs being applied to each segmenton demand, in parallel, asynchronously

T0 T0

Page | 3 © Copyright 2019, Apps Associates LLC. All rights reserved.

www.appsassociates.com

Page 6: White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly

The log structured storage avoids reading before writing. Read-before-write, especially in a large distributed system, can produce bottlenecks in

read performance. During updates the system doesn’t waste time seeking the record on the disk to find where the original value was stored and

write back to that page. Instead, it simply appends it to the end of the file and uses b-tree indexes to hold pointers to the latest version of the

record. So when you write a new block of storage or a new segment of storage it updates the index to point to the most recent version of that

data. In this way there is zero read activity on the database during an update.

Traditional RDBMS use the log only for temporary storage and the permanent home for information is in a traditional random-access storage

structure on disk. In contrast, a log-structured file system stores data permanently in the log: there is no other structure on disk. In the log

structured storage a database actually performs only one write against two writes in a traditional RDBMS. The diagram below compares a

traditional RDBMS recovery, which synchronously replays logs from the last checkpoint with an Aurora recovery where recovery is asynchronous

and in parallel, applying changes only at the segment level.

During the garbage collection process the system cleans the stale records, reclaims the space and merges the latest versions of records into a

smaller number of files.

Recovery from failure in log structured storage based databases happens almost instantaneously because the database file is a write ahead log

and I/O is greatly reduced. When there's a failure event you can just restart the database, update the pointers and you will be up and running.

The diagram below represents clearly how recovery is fast in log structured storage. On recovery, simply restart from the previous checkpoint,

and the engine will scan forward in the log and recover any updates written since the previous checkpoint.

Writes are cached in memory to provide consistent performance and flushed to the disk asynchronously, improving performance significantly.

Backups are continuous and incremental where each new log segment appended is copied to backup storage as it is written.

Lock Free Concurrency with MVCC (Multi Version Concurrency Control)Multi version concurrency is adopted in Aurora where data is never updated, only appended; reads return a copy of the data in the exact state it

existed when the transaction started. In an MVCC system with ongoing transactions there may be several copies of a given item of data, repre-

senting its state at different points in time. The read simply ignores updates from others that occur after its transaction began. Using this

system, readers never block each other, nor do they block writers. This is known as optimistic concurrency.

MVCC provides point in time consistent views. Read transactions under MVCC typically use a timestamp or transaction ID to determine which

state of the database to read, and then reads that version of the data. Read and Write transactions are thus isolated from each other without

any need for locking. Writes create a newer version, while concurrent reads access the older version.

For example Jane requests a piece of data at 10.10, the system is going to look up the data using the pointers in the index and it is going to give

him the most current copy of the data at the time the read operation was executed. While this read is happening, John has changed the data at

10.11, this is fine for Jane as she got the most up to date version of the data that was available when she requested it. If Jane now wants to modify

a value, the system will look to see if the pointer has changed since the read. If the answer is yes, the pointer has changed, then there is a concur-

rency problem. The system will now go and reread the latest data before it writes Jane’s update. This solves lot of scaling issues with relational

databases’ heavy use of locks.

Survivable Caches for faster instance crash recovery AWS has moved the cache out of the database process. It remains warm in the event of a database restart. This will help to resume fully

loaded operations much faster especially during the instance crash recovery.

Instant crash recovery + survivable cache = quick and easy recovery from DB failures

In Summary, the above features of Aurora built on a MySQL engine with the power of Amazon RDS resulted in making Aurora a fast, affordable and powerful RDBMS.

References:

http://web.stanford.edu/~ouster/cgi-bin/papers/lfs.pdf

AWS Aurora Documentation.

Author ProfilesSatyendra Kumar Pasalapudi is an Associate Practice Director in the Infrastructure Managed Services Team at Apps Associates. He is an Oracle ACE Director and speaker at various conferences like OAUG, IOUG, and Oracle Open World. He has worked with various clients on Infrastructure, Cloud and AWS related engagements. He can be contacted at [email protected]

Checkpoint

Page | 4 © Copyright 2019, Apps Associates LLC. All rights reserved.

www.appsassociates.com

Page 7: White Paper AMAZON AURORA - Apps Associatessimplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly

The log structured storage avoids reading before writing. Read-before-write, especially in a large distributed system, can produce bottlenecks in

read performance. During updates the system doesn’t waste time seeking the record on the disk to find where the original value was stored and

write back to that page. Instead, it simply appends it to the end of the file and uses b-tree indexes to hold pointers to the latest version of the

record. So when you write a new block of storage or a new segment of storage it updates the index to point to the most recent version of that

data. In this way there is zero read activity on the database during an update.

Traditional RDBMS use the log only for temporary storage and the permanent home for information is in a traditional random-access storage

structure on disk. In contrast, a log-structured file system stores data permanently in the log: there is no other structure on disk. In the log

structured storage a database actually performs only one write against two writes in a traditional RDBMS. The diagram below compares a

traditional RDBMS recovery, which synchronously replays logs from the last checkpoint with an Aurora recovery where recovery is asynchronous

and in parallel, applying changes only at the segment level.

During the garbage collection process the system cleans the stale records, reclaims the space and merges the latest versions of records into a

smaller number of files.

Recovery from failure in log structured storage based databases happens almost instantaneously because the database file is a write ahead log

and I/O is greatly reduced. When there's a failure event you can just restart the database, update the pointers and you will be up and running.

Survivable Caches for faster instance crash recovery AWS has moved the cache out of the database process. It remains warm in the event of a database restart. This will help to resume fully

loaded operations much faster especially during the instance crash recovery.

Instant crash recovery + survivable cache = quick and easy recovery from DB failures

In Summary, the above features of Aurora built on a MySQL engine with the power of Amazon RDS resulted in making Aurora a fast, affordable and powerful RDBMS.

References:

http://web.stanford.edu/~ouster/cgi-bin/papers/lfs.pdf

AWS Aurora Documentation.

Author ProfilesSatyendra Kumar Pasalapudi is an Associate Practice Director in the Infrastructure Managed Services Team at Apps Associates. He is an Oracle ACE Director and speaker at various conferences like OAUG, IOUG, and Oracle Open World. He has worked with various clients on Infrastructure, Cloud and AWS related engagements. He can be contacted at [email protected]

SQL

Transactions

Caching Caching

SQL

Transactions

Caching

www.appsassociates.com

w w w . s h i p c o n s o l e . c o mw w w . a p p s a s s o c i a t e s . c o m

OUR STRATEGIC PARTNERS

Europe AsiaNorth America (HQ)

ABOUT APPS ASSOCIATESApps Associates is the recognized industry leader for migrating and managing Oracle-to-the Cloud. With thousands of engagements, Apps Associates brings the knowledge, flexibility and relentless customer-first focus companies rely upon to help them move to the cloud and solve their most strategic and complex business challenges. Acting as an extension of customers’ IT teams, Apps Associates delivers breadth of services and dependability along with unparalleled agility and ROI. Longstanding customers such as Sensata Technologies, Brooks Automation, Hologic and Take Two Interactive turn to Apps Associates as their trusted partner for the management of critical business needs, providing strategic consulting and managed services for Oracle, Amazon Web Services, Salesforce, integration, analytics and hybrid cloud infrastructure.

© Copyright 2019, V5.0-1115, Apps Associates LLC. All rights reserved


Recommended