Cloud patterns applied

Post on 05-Aug-2015

113 views 0 download


Cloud patterns applied

Making the most of EC2 at EyeEm

• Site Reliability Engineer at EyeEm

• How do computers even work?

• Started as an operations guy in a scientific datacenter

• Now mostly developing and making users and developers happy



Resilience, Development, Culture.

—Paul Hammond

“If you think you can prevent failure, then you aren’t developing your ability to respond.”

• Have as few as possible machines containing application state

• Test restores of stateful machines

• Have as few as possible machines containing application state

• Test restores of stateful machines

• …all the time.

• Have as few as possible machines containing application state

• Test restores of stateful machines

• …all the time.

• Throw away stateless servers

• Throw away stateless servers

• Make sure they can come up again towards their expected behaviour

—John Allspaw, Richard Cook

“The goal of operations is to have every day be just another boring day. Achieving this boredom depends on foreseeing the future performance

of the system and making adjustments accordingly.”

• Distributed datastores

• Many small servers, rather than few big

33% 33% 33%

33% 33% 33%

50% 50% 33%

50% traffic increase on a single instance

12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%

14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%

14.4% traffic increase on a single instance

• Mark endpoints dead

• Test timeouts

• Single responsibility servers / services

• Security Groups control the interface how services are supposed to talk to another

• Single responsibility servers / services

• Security Groups control the interface how services are supposed to talk to another

• …and can be used to assign server role.

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group


Allow Inbound Backend 3306

Allow Inbound Backend 6379

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group


Allow Inbound Backend 3306

Allow Inbound Backend 6379

Base Security Group

Metrics Security Group


Allow Inbound Base 8125

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group


Allow Inbound Backend 3306

Allow Inbound Backend 6379Base Security Group

Metrics Security Group


Allow Inbound Base 8125

production branch=master

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group


Allow Inbound Backend 3306

Allow Inbound Backend 6379Base Security Group

Metrics Security Group


Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group


Allow Inbound Backend 3306

Allow Inbound Backend 6379 Base Security Group

{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }! }! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl | apt-key add -\necho \"deb $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/ >\nexport puppetbranch=$(python puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./\nfi\nbash ./\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "restore_from_extract",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl | apt-key add -\necho \"deb $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/ >\nexport puppetbranch=$(python puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./\nfi\nbash ./\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "db1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "db1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "db1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "db1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "dbprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "extract-access"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "dbsg": {! "Properties": {! "GroupDescription": "db",! "SecurityGroupIngress": [! {! "FromPort": "3306",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "3306"! },! {! "CidrIp": "",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "db"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "puppetprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "puppet-provisioning"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "redis1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "redissg"! },! "puppeteers"! ],! "Tags": [],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl | apt-key add -\necho \"deb $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/ >\nexport puppetbranch=$(python puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./\nfi\nbash ./\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "redis1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "redis1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "redis1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "redis1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "redissg": {! "Properties": {! "GroupDescription": "redis",! "SecurityGroupIngress": [! {! "FromPort": "6379",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "6379"! },! {! "CidrIp": "",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "redis"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! }! }!}

{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }!

}! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "base"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]!

}! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },!


“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy

for humans to read and write. It is easy for machines to parse and generate.”


“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy

for humans to read and write. It is easy for machines to parse and generate.”

eyeemstack create --machines backend db feed --restore_db extract --branch feature_x

• Python tool on top of troposphere, a python library to create CloudFormation descriptions

vagrant up backend db feed

class eyeem::profiles::backend::deploy {! eyeem::deploy_codebase { “backend”:! directory => ‘/var/www/backend’,! bucket => ‘eyeem-web-backend’,! filename => “backend-${::branch}.tar.gz”,! restart => [‘nginx’, ’php5-fpm’]! }!}!!!!define eyeem::deploy_codebase (! $prefix = '',! $directory,! $bucket,! $filename,! $restart ) {!! if (member($::mountpoints, “${directory}/current”) and $::environment == ‘local’) {! notice(“Looks like we are on Vagrant and you mounted the code in, skipping deploy.”)! } else {! ( . . . )! }!}

• ~70 Cents for a single test run.

• ~3.50 $ per workday.

• ~17.64 $ for always on staging per day.

• Tests disaster recovery on a sample dataset.

• Scalable setup.

• < 10 minutes

• Stagings just a click away.

Backend Security Group role=backend

Base Security Group

Metrics Security Group


Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Base Security Group

Inventory Service Security

Group role=inventory





Backend Security Group role=backend

Base Security Group

Metrics Security Group


Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Base Security Group

Inventory Service Security

Group role=inventory





Jobrunner Service Security

Group role=jobrunner




• ~350 Job Executions last month

• 350 times self service operations

• Stagings everywhere

• Definition of Done: Can you boot it up using EyeEmStack and Vagrant?

• Lots of 99.999s%

• “Everything fails all the time.”

• Test your repairs, automate everything.

• Distribute your data.

• Applications should be able to handle state transitions of service-parts and diagnose failure.

• Design your infrastructure towards acting as a service provider to your developers.

