Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Post on 11-Apr-2017

384 views 5 download

transcript

Active Archiving with Amazon S3 ….and Tiering To

Glacier

Marc TrimuschatAWS Storage Services

Data has gravity

…easier to move processing to the data

ProcessPartner

4k/8kGenomics

SeismicFinancial

LogsIoT

Cloud Data Migration

Direct ConnectSnow* data transport family

3rd Party Connectors

Transfer Acceleration

Storage Gateway

Kinesis Firehose

AWS Storage Platform and SolutionsThe AWS Storage Portfolio

Object

Amazon GlacierAmazon S3

Block

Amazon EBS (persistent)

Amazon EC2 Instance Store

(ephemeral)File

Amazon EFS

Audio Archives – SoundCloud• World’s leading social sound platform

• Audio files transcoded and stored in multiple formats

• Stores PBs of data

• Transcoded files served from Amazon S3

• Originals moved to Amazon Glacier for long-term retention

Satellite Image Archive

• DigitalGlobe takes Satellite imagery of the Earth• 100PB image library = 6 billion square kilometers • 1PB new image every year• Images to be archived and retained for decades

Patient Data–Philips Healthcare

• HealthSuite digital platform powered by AWS

• 15 petabytes of patient data

• Archived for decades (beyond the lifetime of patients)

• Uses AWS HIPAA-eligible services in the BAA

Archive: Data retained for the long term,

for compliance or potential future reference

Data archiving needs are growing everywhere

• Media assets, 4K, 8K• Health care/life sciences • Financial services• Regulated industries• Oil and gas/geospatial• Digital preservation• Long-term backups• Logs

AWS Storage Review

Choice of storage classes

Standard

Active data Archive dataInfrequently accessed data

Standard - Infrequent Access Amazon Glacier

- Transition Standard to Standard-IA

- Transition Standard-IA to Amazon Glacier

- Expiration lifecycle policy

- Versioning support

- Prefix support

Data Lifecycle Management

T T T T T T T T T T T TT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 days

Data access frequency over time

Cross-Region Replication Lifecycle Policy

Data Classification& Management

Event Notifications

CloudWatch Metrics S3 Inventory Audit with CloudTrail Data Events

Storage Analytics

Standard Standard - Infrequent Access Amazon Glacier

Amazon S3: What’s New

Data-driven storage management for S3

• Analyze storage usage to transition the right data to the right storage class• Understand how storage usage changes as your S3 objects get older• Discover how much of your storage is retrieved over time

Manage your dataData Classification and Management

Manage data based on what it is as opposed to where its located

• Easy data management• Classify your data

• Tag your objects with key-value pairs

• Write policies once based on the type of data

Classification Lifecycle PolicyAccess Control

Amazon Glacier

• Extremely low-cost archive storage service, starting at $0.004 GB/mo

• 3 retrieval options: Expedited (1-5min), Standard (3-5hrs), Batch (5-12 hrs)

• 99.999999999% of durability (5-6 orders of magnitude higher than 2 copies of tape)

• All data is encrypted at rest

• Features: compliance, data management, cost management, audit logging

Glacier: Key Concepts• Vaults – Container for archives, up to 1,000 vaults per account• Archives – basic unit, write-once, 40TB max, unlimited archives • Inventory – Cold index of archives refreshed every 24 hours• Access – Three ways to access Glacier• Uploads – Multi-part, lifecycle, cost optimizations, Snowball• Data management – Vault Lock, tagging, audit logs• Retrievals – Retrieval policies, range retrievals, new feature announcements

Archive Consideration 1 – Total Archive Cost

Traditional archiving approaches

• Tape libraries, robots, drives, media• Onsite (online and offline)• Offsite tape out/vaulting• Specialized software and personnel• Tape refresh every 3-5 years

How can AWS help with your archival?

Metered usage:Pay as you go

No capital investmentNo commitment

No risky capacity planning

Avoid risks of physical media handling

Control your geographic locality for performance

and compliance

Consideration 2 – Durability

Amazon S3 and Glacier Durability

4 9s durability

5 9s durability

S3 - IA Glacier

11 9s durability

99.999999999%Durability

Durability for long-term preservation

Built-in Fixity Checking

Automatic recovery

Consideration 3 – Accessibility

Amazon Glacier – Data Retrieval TiersStandard Retrieval• Current model

• 3-5 hours

• Disaster Recovery

Bulk Retrieval• Batch/Bulk access

• 5-12 hours

• PB scale re-transcoding or video/image analysis

Expedited Retrieval• Emergency access

• 1-5 minutes

• Last minute play-out schedule swap

$0.03/GB $0.01/GB $0.0025/GB

On-site tape replacement Off-site tape replacement

Consideration 4 - Application & Data Management

Accessing Glacier1. S3 lifecycle integration2. Direct Glacier API/SDK3. Third party tools and gateways

FastGlacier

Use Glacier via S3 Lifecycle

S3 Standard

Active data Archive dataInfrequently accessed data

S3 - Infrequent Access Amazon Glacier

Synchronous access Async accessSynchronous access

$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.

- Transition Standard to Standard-IA

- Transition Standard-IA to Amazon Glacier

- Transition based on object tags

- Expiration and versioning

Data lifecycle management

T T T T T T T T T T T TT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 days

Data access frequency over time

Transition older videos to Standard-IA

Glacier Direct Upload– The Basics

Create vault1

Configure access policies2

ArchiveApp user policyEffect:AllowResource: arn:aws:glacier:<accountId>:vaults/FilmsAction: glacier:UploadArchive

3 Upload archivesUploadArchive(data) -> Archive ID

Uploading Data: Inter- or Sneaker- net

AWS DirectConnect

Dedicated bandwidth between your site and AWS

InternetTransfer data in a secure SSL tunnel

over the public Internet

AWS Import/Export Snowball

Physical transfer of media into and out of AWS

AWS Snowball EdgePetabyte-scale hybrid device with onboard compute and storage

• 100 TB local storage

• Local compute equivalent to an Amazon EC2 m4.4xlarge instance

• 10GBase-T, 10/25Gb SFP28, and 40Gb QSFP+ copper, and optical networking

• Ruggedized and rack-mountable

RE:INVENT 2016 LAUNCH

Use cases: AWS Import/Export Snowball

Cloud Migration

Disaster Recovery

Data CenterDecommission

ContentDistribution

AWS storage migration expansion:AWS Snowmobile

Storage Gateway Enables Hybrid Storage SolutionsUse standard storage protocols to access AWS storage services

Customer Premises

FileVolume

Tape

Amazon EBS snapshots

Amazon S3Amazon Glacier

AWS IAM

AWS KMS

AWS CloudTrail

Amazon CloudWatch

Internet

Direct Connect

Amazon VPC

NFS

Enterprise storage

Backup servers

Applicationservers

iSCSIVTL

Which option should I choose?

• Use S3 lifecycle managed Amazon Glacier if the S3 object keys are sufficient for index/search capability

• Use Amazon Glacier directly if you already plan to store more metadata/indices in a database

• Use 3rd party tools or AWS Storage Gateway to minimize coding

Media Archive Use Case

corporate data center

Media Archive and Metadata (cloud transition)

Onsite Archive Offsite Tape Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

On-Premise Tape

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

corporate data center

AWS RegionAmazon Glacier

Cloud DAM (Syncing Metadata from on-prem)

Amazon Direct Connect

Offsite Tape ArchiveOn-Premise Tape

Media Archive (transition to the cloud)

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

corporate data center

AWS RegionAmazon Glacier

Cloud DAM (Syncing Metadata from on-

prem)Amazon S3

Cloud Based Processing Tasks

Amazon Direct Connect

On-Premise Tape Offsite Tape Archive

Media Archive (transition to the cloud)

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Taskscorporate data center

AWS RegionAmazon Glacier

Cloud DAM (Syncing Metadata from on-

prem)Amazon S3

Cloud Based Processing Tasks

Amazon Direct Connect

Onsite Cache Offsite Tape ArchiveOn-Premise Tape

Media Archive (transition to the cloud)

Media Solution: Sony DADC

Problem Statement:• Challenged by on-prem legacy infrastructure.• Provide a performant, secure, economical media distribution solution.• Decrease time to market for their customer’s finished content.

Use of AWS:• EC2 content processing and SWF, SQS, SNS for media workflow

automation• S3 for storage, Glacier for content archive• CloudFront for OTT.

Business Benefits: • Workflow pipelines can be run in a highly parallelized fashion through

AWS elastic scalability.• Significantly shorten content delivery SLA with a new AWS enabled

target of 1-hr.• Fully migrating away from on-prem infrastructure.

On-demand cloud-based media supply chain and delivery solution

• Media distribution backbone (Ve.nue platform)• Over-The-Top (OTT) broadcast service• 20PBs of media assets, 1MM+ hours of high-res content • Assets to be archived and retained for decades

Video archives

Comprehensive media lifecycle

@SonyDADCNMS

“If physical deliveries can happen within one hour based on unpredictable

requests, surely we are able to exceed such expectations digitally”

@SonyDADCNMS

Sony Migration

The Challenge

• Seamlessly migrate a platform that enables content delivery across all devices and more than 1,200 distribution points worldwide

• Store 20 petabytes of motion picture and television content

• Equating to 1,000,000M+ Hrs of content

• At a growth curve of ~1 petabyte every quarter

Desired Goals:

• One hour delivery turn around time

• Agile, scalable, predictable cost model & infrastructure

• Investing in innovation vs. hardware

@SonyDADCNMS

On-Premise Asset Storage Workflow

@SonyDADCNMS

AWS Cloud-based Asset Storage Workflow

@SonyDADCNMS

Glacier vs. On-Prem Cost Comparison

@SonyDADCNMS

Consideration 5 - Compliance and Retention

Amazon Glacier Vault Lock allows you to easily set compliance controls on individual vaults and enforce them via

a lockable policy

Time-based retentionMFA authentication

Controls govern all records in a vault

Immutable policyTwo-step locking

Compliance storage with Vault Lock

Glacier Vault Lock• Non-overwrite, non-erasable records

• Time-based retention with “ArchiveAgeInDays” control

• Policy lockdown (strong governance)

• Legal hold with vault-level tags

• Configure designated third-party access and grant temporary access

Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).

Proofpoint• Cloud-based security and compliance for the enterprise: threat

research, email, mobile, social, digital risk• Founded 2002, public in 2012• $350M annual revenue, $3B market cap

Proofpoint SocialPatrol• Policy controls and enforcement for social• Combats fraudulent brand impersonation• Moderates content at scale• Ensures compliance in publishing• Integrates with social APIs• 150+ classifiers using NLP and ML

• Text, links, images, meta data

• Ingesting >1M social posts per day• Built in AWS

Proofpoint SocialPatrol Archive with Glacier

• SEC Rule 17a-4(f)-compliant archive, purpose-built for social, enabled by Amazon Glacier and Vault Lock

PFPT in AWS

Policy engine MySQL/C*/SolrSocialAmazon Glacier &

Vault Lock

Proofpoint SocialPatrol Archive• The customer specifies the retention period in Proofpoint Social:

Proofpoint SocialPatrol Archive• Via AWS API we create a vault for that customer:

Proofpoint SocialPatrol Archive• Via AWS API,

we lock the vault,and specify policyto observe alegal hold via a tag.

Active-Archive Resources• Amazon S3: https://aws.amazon.com/s3/

• Amazon S3 Deep Dive (re-invent 2016): https://www.youtube.com/watch?v=bMhWWkhydFQ&t=249s

• Amazon Glacier: https://aws.amazon.com/glacier/

• Amazon Glacier Deep-Dive (re:Invent 2016): https://www.youtube.com/watch?v=dfr9mBcDJ-U

• WORM Compliance Assessment: https://aws.amazon.com/blogs/aws/glacier-cohasset-assessment/

• Sony Case Study: https://aws.amazon.com/solutions/case-studies/sony-dadc/

• Backup & Archive TCO Calculator: http://www.backuparchive.awstcocalculator.com/

Thank You!