+ All Categories
Home > Technology > AWS June Webinar Series - Best Practices: Dynamic Data Ingestion with S3 and Lambda

AWS June Webinar Series - Best Practices: Dynamic Data Ingestion with S3 and Lambda

Date post: 28-Jul-2015
Category:
Upload: amazon-web-services
View: 1,246 times
Download: 0 times
Share this document with a friend
36
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Vyom Nagrani, Sr. Product Manager, AWS Lambda June 16, 2015 Dynamic Data Ingestion with Amazon S3 and AWS Lambda
Transcript

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Vyom Nagrani, Sr. Product Manager, AWS Lambda

June 16, 2015

Dynamic Data Ingestion with

Amazon S3 and AWS Lambda

Amazon S3 Event Notifications: Integrating

storage and workflows

Delivers notifications to Amazon SNS, Amazon SQS, or AWS

Lambda when events occur in Amazon S3

S3

Events

SNS topic

SQS queue

Lambda function

Notifications

Foo() {…}

Benefits of Amazon S3 Notifications for dynamic

data ingestion

Integration – A new surface on the

Amazon S3 “building block” for event-

based computing

Speed – typical time to send

notifications is less than a second

Simplicity – Avoids proxies or polling

to detect changesProxy

List/Diff

Notifications

or

AWS Lambda: A compute service that runs

your code in response to events

Lambda functions: Stateless, event-driven code execution

Triggered by events:

• Put to an Amazon S3 bucket

• Record in an Amazon Kinesis stream

• Direct sync and async invocations

Makes it easy to

• Build back-end services that perform at scale

• Perform data-driven auditing, analysis, and notification

High performance at any scale;

Cost-effective and efficient

No Infrastructure to manage

Pay only for what you use: Lambda

automatically matches capacity to

your request rate. Purchase

compute in 100ms increments.

Bring Your Own Code

“Productivity focused compute platform to build powerful, dynamic,

modular applications in the cloud”

Run code in a choice of standard

languages. Use threads, processes,

files, and shell scripts normally.

Focus on business logic, not

infrastructure. You upload code; AWS

Lambda handles everything else.

Benefits of AWS Lambda for building a server-

less data processing engine

1 2 3

What you can do with S3+Lambda

Customers have told us about powerful applications …

… and we look forward to seeing what you create.

Today’s demo #1: Workflow of a simple video

transcoding application

Notification

Amazon S3 AWS Lambda Amazon S3

New video

uploaded

Walkthrough of setting up S3 event notifications and

Lambda functions through the AWS Console

Walkthrough of setting up S3 event notifications and

Lambda functions through the AWS Console

Walkthrough of setting up S3 event notifications and

Lambda functions through the AWS Console

Walkthrough of setting up S3 event notifications and

Lambda functions through the AWS Console

Walkthrough of setting up S3 event notifications and

Lambda functions through the AWS Console

Walkthrough of setting up S3 event notifications and

Lambda functions through the AWS Console

Walkthrough of setting up S3 event notifications and

Lambda functions through the AWS Console

Code walkthrough for video clip transcode

Setup variables

Serialize steps

Get file from S3

Code walkthrough for video clip transcode

Write to disk

ffmpeg Transode

Read from disk

Upload to S3

Demo #1: Automatic video

transcoding with Amazon S3 and

AWS Lambda

Potential further additions to a production

video transcoding application

• Include custom transcoding/watermarking libraries

• Break longer video files into smaller clips, transcode each clip separately

• Transcode to multiple formats by running multiple Lambda functions in parallel

• Send S3 event notification to an SNS topic

• Subscribe multiple Lambda functions to that SNS topic

Today’s demo #2: Workflow of infrastructure

monitoring and automation application

Notification

Amazon S3 AWS LambdaAWS

CloudTrail

Amazon SNS

AWS IAM

Optional

Code walkthrough for infrastructure monitoring

Get file from S3

Unzip it

Parse it

Check activity

Code walkthrough for infrastructure monitoring

Find patterns

Take action

Demo #2: Infrastructure

monitoring and automation using

AWS CloudTrail, Amazon S3 and

AWS Lambda

Potential further additions to a production

infrastructure monitoring and automation

• In addition to monitoring and alarming, create automated actions in response to policy

violations or suspicious activity

• Create .config file with multiple check points

• Each check can have a different SNS topic to alarm against

• Aggregate CloudTrail log files to be delivered to a single admin S3 bucket across all your

AWS accounts

Today’s demo #3: Workflow of automated file

de-duplication on upload

Notification

Amazon S3 AWS Lambda

Amazon S3

New file

uploaded

Amazon

DynamoDB

Optional

Code walkthrough for automated file de-duplication

Get headObject

List other objects

Compare eTags

Code walkthrough for automated file de-duplication

Take action

Demo #3: Automatic File De-

duplication using Amazon S3 and

AWS Lambda

Potential further additions to a production

automated file de-duplication

• Create and compare SHA hash for each file instead of using S3 eTag to reduce collision

• Handle collision situations by calling another Lambda function to do a full file compare

• Index all hashes to a DynamoDB table, check against table instead of reading all files in the

bucket each time a new file is uploaded/edited

• Create Lambda wrapper around deleteObject API call to update index table

Things to remember about S3 Notifications

• Amazon S3 event notifications are set up at the bucket level

• Highly reliable – designed for nine ‘9’s with at least once delivery

• Currently supports Put, Post, Copy, MultiPartComplete, and RRSObjectLost events

• Configuration stored as XML in the notification subresource associated with a bucket

• No additional charge for S3 Notifications

Attaching a Lambda function to S3 Notifications

• Automatic Scaling: Both S3 and Lambda scale automatically with higher PUT rates

• Lambda has a default limit of 1000 TPS, which can be increased by AWS Support Center

• Lambda queues all incoming requests from S3

• Lambda can absorb reasonable bursts of traffic for approximately 15-30 minutes

…Source

S3

Destination

1

Lambda

Destination

2

Functions

Lambda will scale with higher PUT rateS3 scales automatically

… Lambda

Frontend Queue

Best practices for creating Lambda functions

• Memory: CPU proportional to the memory configured

• Increasing memory makes your code execute faster (if CPU bound)

• Timeout: Increasing timeout allows for longer functions, but more wait in case of errors

• Retries: For S3, Lambda retries each function at least 3 times

• Events rejected by AWS Lambda may be retained and retried by S3 for 24 hours

• Permission model: S3 pushes events to Lambda, so grant S3 invocation permission

through a resource policy, and add the execution role Lambda

Monitoring and Debugging Lambda functions

• Monitoring: available in Amazon CloudWatch Metrics

• Invocation count

• Duration

• Error count

• Throttle count

• Debugging: available in Amazon CloudWatch Logs

• All Metrics

• Custom logs

• RAM consumed

• Search for log events

• Real time feed of log events delivered to an Amazon Kinesis stream

Customers running dynamic data ingestion

and processing using S3+Lambda

AWS

Lambda

Indexing

tables or

notifications

“I want to apply custom logic to process content being uploaded to my data store”. • Watermarking / thumbnailing• Transcoding• Indexing and deduplication• Aggregation and filtering• Pre processing• Content validation

Amazon S3

Bucket

Events

Transcoded

files

Three Next Steps

1. Enable S3 notification feature on your existing S3 buckets. Amazon S3 event notifications can be sent in response to actions taken on objects uploaded or stored in Amazon S3.

2. Create and test your first Lambda function. With AWS Lambda, there are no new languages, tools, or frameworks to learn. You can use any third party library, even native ones.

3. Use AWS Lambda to process Amazon S3 objects … no infrastructure to manage, and setup a dynamic data ingestion pipeline in minutes!

Thank you!

Visit http://aws.amazon.com/s3, the

AWS blog, and the S3 forum to learn

more and get started using S3.

Visit http://aws.amazon.com/lambda,

the AWS Compute blog, and the

Lambda forum to learn more and get

started using Lambda.

AWS Summit – Chicago: An exciting, free cloud conference designed to educate and inform new

customers about the AWS platform, best practices and new cloud services.

Details• July 1, 2015

• Chicago, Illinois

• @ McCormick Place

Featuring• New product launches

• 36+ sessions, labs, and bootcamps

• Executive and partner networking

Registration is now open• Come and see what AWS and the cloud can do for you.

• Click here to register: http://amzn.to/1RooPPL


Recommended