+ All Categories
Home > Technology > Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Date post: 03-Mar-2017
Category:
Upload: amazon-web-services
View: 633 times
Download: 2 times
Share this document with a friend
78
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Edward Naim, Head of Product, Amazon EFS Darryl Osborne, Storage Specialist Solutions Architect David Green, Enterprise Solutions Architect February 23 rd , 2017 Deep Dive on Amazon EFS
Transcript
Page 1: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Edward Naim, Head of Product, Amazon EFSDarryl Osborne, Storage Specialist Solutions ArchitectDavid Green, Enterprise Solutions Architect

February 23rd, 2017

Deep Dive on Amazon EFS

Page 2: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Learn why and when to use Amazon EFS

Understand key technical & security concepts

Discover how to leverage EFS’s performance

See EFS in action: Hands-on demos

Review EFS’s economics

Answer your questions (Q&A)

What to expect from this webinar

Page 3: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Why & When to Use Amazon EFS

Page 4: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Cloud Data Migration

Direct Connect

Snow* data transport family

3rd Party Connectors

Transfer Acceleration

Storage Gateway

Kinesis Firehose

AWS Storage Platform and SolutionsThe AWS Storage Portfolio

Object

Amazon GlacierAmazon S3

Block

Amazon EBS (persistent)

Amazon EC2 Instance Store

(ephemeral)File

Amazon EFS

Page 5: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Amazon EFS attributes

1) Standard file system interface & semantics2) Shared storage3) Highly available4) Highly durable5) Consistent, low latencies6) Scalable (storage & throughput)7) Elastic capacity8) Fully managed

Page 6: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

We focused on changing the game

Simple Elastic Scalable

1 2 3

Highly durable

Highly available

Page 7: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Amazon EFS is Simple

• Fully managed- No hardware, network, file layer- Create a scalable file system in seconds!

• Seamless integration with existing tools and apps- NFS v4.1—widespread, open- Standard file system access semantics- Works with standard OS file system APIs

• Simple pricing = simple forecasting

1

Page 8: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Amazon EFS is Elastic

• File systems grow and shrink automatically as you add and remove files

• No need to provision storage capacity or performance

• You pay only for the storage space you use, with no minimum fee

2

Page 9: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

• File systems can grow to petabytes of capacity

• Throughput scales automatically as file systems grow

• Consistent low latencies regardless of file system size

• Support for thousands of concurrent NFS connections

Amazon EFS is Scalable3

Page 10: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

• Every file system object is redundantly stored across multiple Availability Zones in a Region

• Designed to sustain Availability Zone offline conditions

• Superior to traditional NAS availability models

• Appropriate for production/tier 0 applications

High Durability & High Availability

Page 11: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

In which Regions can I use EFS today?

• US West (Oregon)

• US East (N. Virginia)

• US East (Ohio)

• EU (Ireland)

More coming soon!

Page 12: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Do you need an EFS file system?

If you have an application (EC2 or on-premises) or use case that requires a file system AND

• Requires multi-attach OR• GBs/s throughput OR• Multi-AZ availability/durability OR• Requires automatic scaling (grow/shrink) of storage

Page 13: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

What customers are using EFS for today

Web serving Content management

Analytics

Media and Entertainment workflows

Workflow managementHome directories

Container storage

Database backups

Page 14: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Understand Key Technical and Security Concepts

Page 15: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

What is a file system?

• The primary resource in EFS

• Where you store files and directories

• Can create 125 file systems per account

Page 16: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

What is a mount target?

• To access your file system within a VPC, you create mount targets in the VPC

• A mount target is an NFS endpoint that lives in your VPC

• A mount target has an IP address and a DNS name you use in your mount command

• A mount target is highly available

AVAILABILITY ZONE 1

REGION

AVAILABILITY ZONE 2

AVAILABILITY ZONE 3

VPC

EC2EC2

EC2

EC2

Mount target

Page 17: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

How to access a file system from an instance

• You “mount” a file system on an Amazon EC2 instance (standard command) — the file system appears like a local set of directories and files

• An NFS v4.1 client is standard on Linux distributions

mount –t nfs4 –o nfsvers=4.1[file system DNS name]://[user’s target directory]

Page 18: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

How does it all fit together?

AVAILABILITY ZONE 1

REGION

AVAILABILITY ZONE 2

AVAILABILITY ZONE 3

VPC

EC2EC2

EC2

EC2

File system

Data can be accessed from any AZ in the Region while maintaining full consistency

Page 19: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Several security mechanisms

Control network traffic to and from file systems (mount targets) by using VPC security groups and network ACLs

Control file and directory access by using POSIX permissions

Control administrative access (API access) to file systems by using AWS Identity and Access Management (IAM)

EFS supports action-level and resource-level permissions

Page 20: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Access your EFS file system via AWS Direct Connect

Direct Connect EFS in your Amazon VPCOn-premises servers

Page 21: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Direct Connect support addresses three of four hybrid scenarios

Bursting

Migration

Tiering

Backup / DR

Page 22: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Learn How to Leverage EFS’s Performance

Page 23: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Amazon EFS is designed for wide spectrum of performance needs

High throughput and parallel I/O

Low latency and serial I/O

GenomicsBig data analyticsScale-out jobs

Home directoriesContent management

Web servingMetadata-intensive

jobs

Page 24: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Choose the performance mode best suited to your workloadMode What’s it for? Advantages Tradeoffs When to use

General purpose (default)

Latency-sensitive applications and general-purpose workloads

Lowest latencies for file operations

Limit of 7,000 ops/sec

Best choice for most workloads

Max I/O Large-scale and data-heavy applications

Virtually unlimited ability to scale out throughput/IOPS

Slightly higher latencies

Consider if 10s (or more) instances access your file system concurrently

Page 25: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Use the PercentIOLimit CloudWatch metric to determine if you’re constrained by General Purpose mode

Page 26: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Amazon EFS has a distributed data storage design

EC2EC2

EC2EC2

EC2EC2

…• File systems distributed across

unconstrained number of servers• Avoids bottlenecks/constraints of

traditional file servers• Enables high levels of aggregate

IOPS/throughput

• Data also distributed across Availability Zones (durability, availability)

Page 27: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

How to think about EFS perf relative to EBS

Amazon EFS Amazon EBS PIOPS

Performance

Per-operation latency Low, consistent Lowest, consistent

Throughput scale Multiple GBs per second Single GB per second

Characteristics

Data availability / durability Stored redundantly across multiple AZs Stored redundantly in a single AZ

Access 1 to 1000s of EC2 instances, from multiple AZs, concurrently Single EC2 instance in a single AZ

Use casesBig Data and analytics, media processing

workflows, content management, web serving, home directories

Boot volumes, transactional and NoSQL databases, data warehousing

& ETL

Page 28: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

An implication of per-operation latency: I/O size impacts throughput of serialized operations

4 KB 32 KB 256 KB 2 MB 16 MB

I/O size

Thro

ughp

ut

Page 29: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

How to take advantage of EFS’s distributed architecture: Parallelize

Parallelize via multiple threads and/or multiple instances

0 20 40 60 80 100 120 140 1600

5000

10000

15000

20000

25000

30000

Aggregate IOPS of parallel writes using10 m4.xlarge instances

# of Total Threads

IOP

S

Page 30: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Use CloudWatch for a number of views of file system performance

DataReadIOBytesDataWriteIOBytesMetadataIOBytesTotalIOBytes

Measure throughput (‘Sum’ of bytes divided by seconds in time period) or ops/sec (‘Data Samples’ divided by seconds in time period)

BurstCreditBalance Monitor your burst credit usage over time to ensure sufficient throughput capacity

PermittedThroughput Compare to actual throughput to determine whether you’re being constrained by the burst model

ClientConnections View the number of clients connected to your file system

PercentIOLimit Determine whether you’re being constrained by General Purpose mode (PercentIOLimit at or near 100%)

Page 31: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Recommended kernel version and NFS mount options

Kernel version

Use Linux kernel 4.0+ (e.g., Amazon Linux 2016.03.0, Ubuntu 15.10 or 16.04)

Mount options

Mount via NFSv4.1 Specify 1MB read/write buffers (“rsize”/”wsize”) Ensure operations are asynchronous

Recommend the following mount options:-o nfsvers=4.1,

rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,async

Page 32: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

See EFS in Action: Move Data

Page 33: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Goal: Move Data Quickly!!

Page 34: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Two Scenarios:

Page 35: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Transferring media assets to EFS

• Size ranges from a few GB to 100+GB per file

• Data sources:

• Amazon S3

• Amazon EBS

Page 36: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Transferring many small files to EFS

• Size ranges from 64K to 256K• Data sources:

• Amazon S3

• Amazon EBS

Page 37: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Serial vs Parallel

Page 38: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Serial file transfer

Page 39: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Parallel file transfer

Page 40: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

How do we do this?

Page 41: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

GNU parallel

• Tool for executing jobs in parallel• Similar to xargs• Replace loops in shell scripts• GNU parallel makes sure output

from the commands is the same output as you would get if you had run the commands sequentially

https://www.gnu.org/software/parallel/

For people who live life in the parallel lane

Page 42: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Use parallel threads – GNU parallel

# Create destination directory tree from source

find . -type d -print0 | parallel -j $N_THREADS -0 "mkdir -p ${DST_DIR}/{}" > /dev/null 2>&1

# Copy files

find . ! \( -type d \) -print0 | parallel -j $N_THREADS -0 "cp -f {} ${DST_DIR}/{}"

Page 43: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Optimizing Transfers

Page 44: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Monitoring performance

• Data-driven results• Repeatable outcomes• Optimize for costs

Page 45: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Benchmark different instance types

• Determine the optimal instance size• What is best? T2, C3, C4, M3, M4,

R3, X?• Transfer test set of 1000 small files• Increase thread count from 1-1024

concurrent threads

Page 46: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Tools

• Command orchestration• Instance configuration

• Log collection• Visualization

• Instance performance

Page 47: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Test Results – Large Files

Page 48: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Large Files: Four Instances

Page 49: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Large Files: Four Instances

Page 50: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Adding Additional Instances

Page 51: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Large File: 50 Instances

Page 52: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Test Results – Small Files

Page 53: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Small File Performance - Instance Family Test

~200 threads

Page 54: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

c3.large – 5,342 files per minute @ 200 threads

Page 55: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Increase Instance Count

• Using optimal instance size• c3.large

• Using optimal thread counts • ~200 per instance

• Increase instance count• 300 instances

• Optimize for costs• EC2 Spot Market

Page 56: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

EC2 Spot

Page 57: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

c3.large – 300 instances

Page 58: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Summary / tl;dr

Page 59: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Results

Small files – 300 instancesLarge files – 50 instances

Page 60: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Demo

Page 61: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Summary / tl;dr

• Parallelize everything• Threads• Instances

• Test, test, test• Capture & analyze test data• Less than $5/hr for 300 instances

Page 62: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

See EFS in Action: Web Serving

Page 63: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Content Management & Web Serving

Web-based applications for creating and managing website content.

wikisblogs

discussion boards

Page 64: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Free and open-source content management system hosted on a web platform

Web software to create beautiful websites, blogs, or apps

“Free and priceless at the same time” – WordPress.org

CODE IS POETRY

Page 65: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

27% of all websites (November 2016) – Web Technology Surveys

Easiest and most popular blogging system in use on the Web – CMS Usage Statistics

Supporting more than 60 million websites – Forbes

WordPress is Popular

Page 66: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Available as..

• Managed Web Hosting Service

• Software package from WordPress.org installed on self-provisioned web platform… like AWS

How are people running WordPress today?

Page 67: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Structured data(Posts, pages, comments, categories, tags, etc.)

Amazon EFSUnstructured data(directories, php files, config, themes, plugins, etc.)

Amazon RDS

Amazon EC2Web Server(Amazon Linux, Apache, PHP, OPCache)

Page 68: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

WordPress Demo

Page 69: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Reference Architecture https://aws.amazon.com/architecture/

Coming Soon

Page 70: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Economics

Page 71: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Simple and predictable pricing

• With Amazon EFS, you pay only for the storage space you use No minimum commitments or up-front fees No need to provision storage in advance No other fees, charges, or billing dimensions 

• EFS price: $0.30/GB-month (US Regions)$0.33/GB-month (EU Ireland)

Page 72: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

AVAILABILITY ZONE 1

REGION

EC2

AVAILABILITY ZONE 2

AVAILABILITY ZONE 3

EC2

Compute nodes to manage 3rd-party file system layer

EBS

Replicated storage volumes

EBS

Inter-AZ traffic for replication

Typical multi-AZ file system setup without EFS

EC2

NFS client accessing file

system

NFS

Page 73: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

TCO example

Let’s say you need to store ~500 GB and require high availability and durability

Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilization) and fully replicate the data to a second Availability Zone for availability/durability

Example comparative cost:Storage (2x 600 GB EBS gp2 volumes): $120 per monthCompute (2x m4.xlarge instances): $350 per monthInter-AZ data transfer costs (est.): $129 per monthTotal$599 per month

EFS cost is (500GB * $0.30/GB-month) = $150 per month, with no additional charges

Page 74: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Summary

Page 75: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Key Recommendations

• Test your application!

• Use General Purpose mode for lowest latency, Max-I/O for scale-out

• Use Linux kernel version 4.0 or newer, mount via NFSv4.1

• To optimize, look for opportunities to:• Aggregate I/O• Perform async operations• Parallelize (demo later)• Cache (demo later)

• Don’t forget to check your burst credit earn/spend rate when testing – ensure sufficient amount of storage

Page 76: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Coming Soon: Encryption of data at rest

• Integrated with AWS Key Management Service• Encryption/decryption handled transparently• No extra cost

Page 77: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Additional Resources

Amazon EFS Site- https://aws.amazon.com/efs/

Amazon EFS User Guide- https://docs.aws.amazon.com/efs/latest/ug/whatisefs.html

AWS 10-Minute Tutorials- https://aws.amazon.com/getting-started/tutorials/

Reference Architecture - WordPress on EFS coming soon- https://aws.amazon.com/architecture/

qwikLABS- https://aws.qwiklabs.com/

YouTube: Amazon Web Services Channel

Page 78: Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Thank you!


Recommended