Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | amazon-web-services |
View: | 1,173 times |
Download: | 3 times |
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Asha Chakrabarty, Senior Solutions Architect, AWS
Will White, Engineering Lead, Mapbox
December 1, 2016
Running Batch Processes on ECS
CON310
What to Expect from the Session
• Understand the challenges of running batch processes
• Why Amazon ECS for Batch?
• Architectural Design Patterns
• Best Practices
• Mapbox and Amazon ECS
Challenges of Running Batch Workloads
• Typically resource intensive
• Time constraint for completion
• Potential impact to concurrent batch jobs
• Scaling infrastructure resources
• Ensuring effective resource utilization and cost savings
• Fragile and unreliable
What Batch Workloads Need
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
Why ECS for Batch Processing?
Cluster Management Made Easy
Nothing to run
Complete state
Control and monitoring
Scale
Performance at Scale
Flexible Container Placement
Applications
Batch jobs
Multiple schedulers
Designed for Use with Other AWS Services
Elastic Load Balancing
Amazon Elastic Block Store
Amazon Virtual Private Cloud
AWS Identity and Access Management
AWS CloudTrail
Security
Your own EC2 instances in a VPC
with all its security features to
provide a high level of isolation.
Key Concepts
Tasks Containers
ClustersContainer Instances
TasksContainers
ClustersContainer Instances
Task: A grouping of related containers
Nginx Web Server Rails Application
MySQL Database
Log Collector
Task Definition
{ “family” : “my-website”,
“version” : “1.0”
“containers” : [
<<CONTAINER DEFINTIONS>>
]
}
Tasks Containers
ClustersContainer Instances
Container Definition
Names and identifies your image
Includes default runtime attributes for your container• Environment Variables
• Port Mappings
• Container entry point and commands
• Resource constraints
• Etc.
Example
{ “name” : “webServer”,
“image” : “nginx:latest”
“cpu” : 512,
“memory” : 128,
“portMappings” : [ { “containerPort” : 9443, “hostPort” : 443 }],
“links” : [“rails”],
“essential” : true
}
Tasks Containers
ClustersContainer Instances
Cluster
Provides a pool of resources for
your Tasks
A grouping of Container Instances
Starts empty, dynamically scalable
Tasks Containers
ClustersContainer Instances
Container Instance
EC2 instance on which Tasks are scheduled
We provide ECS-optimized AMI or you can download lightweight ECS Agent
Registers into cluster upon launch
Different EC2 instance types for variety in resource pool
Architectural Design Patterns
Trigger Batch Processing with Lambda
Amazon ECS
Availability Zone Availability Zone
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3 Bucket
(Source)
ecs:RunTask
Amazon
S3 Bucket
(Target)
Amazon
S3 Bucket
ObjectAmazon
CloudWatchAWS CloudTrail
Fleet of workers with ECS with SQS
Amazon ECS
Availability Zone Availability Zone
SQS queue
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3
DynamoDB
Amazon
Kinesis
ecs:RunTask
Amazon
CloudWatchAWS CloudTrail
Long-running Batch Jobs
• Utilize Spot Instances
• EC2 Spot Blocks for
Defined-Duration
Workloads
• ECS event stream for
CloudWatch Events
• Service Scaling and
Monitoring
Amazon ECS
Availability Zone Availability Zone
Container Instance Container InstanceAutoScaling Group
Task A Task B
Task C
Amazon
CloudWatchAWS CloudTrail
Best Practices
• Store state and inputs, outputs in S3 or another datastore
• Minimize dependencies between task definitions (should
be independent of each other)
• Use Spot Instances and Spot fleets for long-running
batch jobs
• Monitor cluster state with ECS APIs
• Share pools of resources
• Auto Scaling, VPC, IAM, scheduled Reserved Instances
ECS at Mapbox
Maps
Directions Geocoding
Mobile
Developer tools
Analysis
3 billion probes = 100 million miles
per day
Similar pattern for batch processing
• EC2 instances
• SQS queue
• Error handling / reporting
Introducing Watchbot
What is watchbot?
A library to help run a highly-scalable AWS service that
performs data processing tasks in response to external
events.
You provide the the messages and the logic to process
them, while Watchbot handles making sure that your
processing task is run at least once for each message.
https://github.com/mapbox/ecs-watchbot
ECS Cluster
SQS
Watcher
Container
Running
Tasks
Your task can do anything you want!
• Your task can be anything that works in Docker
• Use any language
• Environment variables as input
• bash exit codes to indicate success/failure/retry
• Do any I/O
• Save outputs to S3 or DynamoDB
Environment Variables
Name Description
Subject the message's subject
Message the message's body
MessageId the message's ID defined by SQS
SentTimestamp the time the message was sent
ApproximateFirstReceiveTimestamp the time the message was first received
ApproximateReceiveCount
the number of times the message has been
received
Messages
• Use any format as long as your task is equipped to handle
it
• JSON can capture more complex
Exit Codes
Exit code Description Outcome
0 completed successfullymessage is removed from the queue without
notification
3 rejected the messagemessage is removed from the queue and a
notification is sent
4 no-opmessage is returned to the queue without
notification
other failuremessage is returned to the queue and a
notification is sent
More features!
• Logging - write logs to CloudWatch LogGroup
• Send alarms to SNS
• Reduce mode - tracks progress of distributed tasks and
runs a reduce task when everything finishes
Why not Lambda?
Watchbot is similar in many regards to AWS Lambda, but is
more configurable, more focused on data processing, and
not subject to several of Lambda's limitations.
• Full control over execution environment allows you to install anything you want
• No limits on execution time
• No memory limits
• No concurrency limits or account-wide throttling
• No DynamoDB Streams or Kinesis support
Gotcha: EBS Boot
• ECS optimized instances are only available as EBS boot
AMIs so consider rolling your own instance store AMI
• EBS is more expensive - especially if you are running
many instances on Spot
• Slower than ephemeral disks
Gotcha: EBS Boot
Demo!
https://github.com/mapbox/ecs-telephone
14Data Processing
Services
3500Peak Container
Instances
500 millionCompute Hours Used
This Year
Thank you!
Remember to complete
your evaluations!