+ All Categories
Home > Business > AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Date post: 15-Jan-2015
Category:
Upload: amazon-web-services
View: 1,902 times
Download: 1 times
Share this document with a friend
Description:
Join our webinar to learn more about how to build a cost effective archive application using Amazon Glacier, an extremely low cost, secure, highly durable, and easy to use storage service in the AWS cloud. We will explain how Amazon Glacier works and walk through some best practices to get the most out of the service We will also highlight how to choose between Amazon Glacier and Amazon S3’s Glacier storage option. Learn more: http://aws.amazon.com/glacier/
Popular Tags:
25
Archiving in the Cloud Best Practices for Amazon Glacier Colin Lazier & Henry Zhang
Transcript
Page 1: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Archiving in the Cloud

Best Practices for Amazon Glacier

Colin Lazier & Henry Zhang

Page 2: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

What We’ll Cover Today

Overview of Amazon Glacier

Amazon Glacier Key Concepts

Key Use Cases and Benefits

Best Practices with Amazon Glacier

Q & A

Page 3: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Overview of Amazon Glacier

Page 4: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

With Amazon Glacier, You Can:

Achieve extremely low storage costs for archive data

Pay only for what you use

No longer maintain your own physical storage infrastructure

Increase durability and geographic redundancy

Secure your data

Access on-demand computing EC2

Page 5: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

What is Archival Data?

Most data stored is infrequently accessed (Cold Data)

Often older data still important for future reference

Typically long-lived (months or years)

Business and regulatory reasons to retain data

Page 6: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

What is Amazon Glacier?

Extremely low cost archive storage service

Allows you to retrieve any amount of data within 3-5 hours

Provides high-durability storage

Makes it easy to retain data safely and securely for months,

years, or decades

Page 7: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Benefits with Amazon Glacier

Secure Low cost

Simple Durable

Flexible

Use multiple services

As little as $0.01/GB/month with no up-front

capital commitments.

Leverage AWS’ robust security platform.

Control access to your data.

Designed to provide an average annual

durability of 99.999999999% per archive.

Eliminate your operational overhead. Focus

your resources on your core business.

Easily leverage other AWS services once your

data is in the AWS cloud.

Store any amount of data on-demand. Eliminate

the need for capacity planning.

Page 8: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Media Archives

Enterprise Archives

Scientific Archives

Enterprise Information Archiving includes archiving

email, business documents and other unstructured

content. Driven by business needs, compliance

requirements, and to reduce primary storage costs.

Customer Data Archiving Examples

Media companies’ core assets (books, movies,

music, TV etc.) can grow to hundreds of petabytes.

Amazon Glacier reduces the cost of storing these

assets while simultaneously increasing the durability,

ease of use, and accessibility of the content.

Research and scientific organizations, such as

pharmaceutical and bio-tech companies, as well as

universities, store many large but rarely accessed

data sets.

Page 9: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Amazon Glacier Key Concepts

Page 10: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

High-level Amazon Glacier Architecture

RDS

Control Access to your data

Amazon IAM

Send + Receive Data

Archive Application

Archive Application (Search, Policy-based data

management, eDiscovery)

Index (Index of your

archived data)

Amazon Glacier

HTTP / REST APIs / AWS Import/Export

Page 11: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Amazon Glacier Concepts

Archives

An archive is a durably stored block of information. You store your data in

Amazon Glacier as archives. You may upload a single file as an archive,

but your request costs will be lower if you aggregate your data. TAR and

ZIP are common formats that customers use to aggregate multiple files into

a single file before uploading to Amazon Glacier

Vaults

You use vaults to organize the data you store in Amazon Glacier. Each

archive is stored in a vault of your choice. You may control access to your

data by setting vault-level access policies

Page 12: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Uploading Data to Amazon Glacier

Create Vault

1

Configure Access Policies (Optional)

via

2 Upload Archives

3 Retrieve Archives

Archives are retrieved 3 - 5 hours after being requested

Initiate

Job

Track

Job

Download

Job

Output

Amazon Identity and

Access Management

Page 13: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Retrieving Data from Amazon Glacier

Create Vault

1

Configure Access Policies (Optional)

via

2 Upload Archives

3 Retrieve Archives

Archives are retrieved 3 - 5 hours after being requested

Initiate

Job

Track

Job

Download

Job

Output

Amazon Identity and

Access Management

Page 14: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Sending / Retrieving Data

Sending and retrieving data

• Glacier REST-based APIs to send and retrieve data

• Direct Connect

• Amazon S3 lifecycle archival to Amazon Glacier

Page 15: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Additional Amazon Glacier / AWS Concepts

Vault Inventory

For a real time view of the contents of your vaults, you would refer to your index. For Disaster Recovery purposes, in case you lose or corrupt your index, Amazon Glacier maintains an inventory of all your archives in a vault. The vault inventory is updated approximately once a day

Amazon Simple Notification Service (Amazon SNS) Amazon Simple Notification Service (Amazon SNS) is a web service that makes it easy to set up, operate, and send notifications from the cloud

Page 16: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Amazon Glacier Key Concepts

Create Vault

1

Configure Access Policies

(Optional) via

Configure Notification Policies

(Optional) via

Amazon Simple

Notification Service

2 Upload Archives

3 Retrieve Archives

Archives retrieved 3 - 5 hours after being requested

Your

Application

Notifications sent via

Amazon SNS

Download

Archives

Initiate

Job

Track

Job

Download

Job

Output

AWS Management Console Operations

Also accessible via Amazon Glacier APIs or SDKs Amazon Glacier API Operations

Also accessible via Amazon Glacier SDKs

Amazon Identity and

Access Management

Page 17: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Best Practices with Amazon Glacier

Page 18: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Aggregate Large Number of Smaller Files

Reduce overhead costs

Reduce requests costs

Find ideal archive size for your use case

Page 19: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Uploading Large files – MultipartUpload

Internet weather

Distance between your application and Amazon Glacier

Cost of retrying failed transmissions

Improve upload throughput

Page 20: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Multipart Upload

Improve speed and reliability with multipart upload

1. InitiateMultipartUpload(partSize) -> uploadId

2. UploadPart(uploadId, data)

3. CompleteMultipartUpload(uploadId) -> archiveId

Page 21: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Optimize Data Retrieval and Download

Retrieval vs. Download

Ranged Retrieval

• Reduce cost, control retrieval rate

• Retrieve only what you need

Ranged Download (Get)

• Improve download speed

• Be aware of your download speed as data is only staged for 24 hours

Page 22: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Ranged Retrieval Example

Example 12 GB archive

Retrieved using a single 4 hour job = 3GB/hour peak

retrieval

Retrieved over 24 hours using 6 consecutive jobs =

0.5GB/hour peak retrieval

Page 23: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Amazon Glacier Benefits

Secure Low cost

Simple Durable

Flexible

Use multiple services

As little as $0.01/GB/month with no up-front

capital commitments.

Leverage AWS’ robust security platform.

Control access to your data.

Designed to provide an average annual

durability of 99.999999999% per archive.

Eliminate your operational overhead. Focus

your resources on your core business.

Easily leverage other AWS services once your

data is in the AWS cloud.

Store any amount of data on-demand. Eliminate

the need for capacity planning.

Page 24: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

Thank You

Q&A with

Colin Lazier & Henry Zhang

Page 25: AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

http://aws.amazon.com/glacier


Recommended