+ All Categories
Home > Technology > London Hug 19/5 - Terraform in Production

London Hug 19/5 - Terraform in Production

Date post: 14-Apr-2017
Category:
Upload: london-hashicorp-user-group
View: 301 times
Download: 2 times
Share this document with a friend
31
Terraform in production
Transcript

Terraform in production

• Scaleable web capacity • Scaleable load balancer capacity • Scalable service capacity • Scaleable, repeatable, self service Elasticsearch and Cassandra • Provisioning a new prod datacenter cut from 12+ months to 2. • Auto scaling, using spot capacity • Operations and Infrastructure teams:

• More efficient • More agile • Providing more business value

2

Key Results

3

4

5

• Wrap Terraform up to make it harder to screw up • Makefile - easiest path - make plan / make apply

• Use allowed_account_ids in providers (AWS specifically)

6

Lesson 1:

7

• External Node Classifer • Puppetmaster calls a script, returns node definition • Create node definition from EC2 tags

puppet::role::elasticsearch_cluster => cluster_name=reviews

• Stop needing individual hostnames!

8

puppet ENC

• My small contributions to the community • github.com/terraform-community-modules/tf_aws_ubuntu_ami • github.com/terraform-community-modules/

tf_aws_availability_zones

• Modules you can reuse to stop having to hard code IDs • make + getvariables.rb pattern • We have internal versions at Yelp (usually to bake in our

variables.tf.json)9

I hate magic numbers

• Standup update from a coworker: • Yesterday: “Learned Go” • Today: “Implemented yelpaws_instance”

• Adds “ubuntu” and “region” + “account” variables to aws_instance • Looks up the AMI to use automatically • Only on initial launch, puppet converges machines after that! • https://github.com/Yelp/terraform-ami_fromhttp

10

yelpaws_instance

11

• Modules • Don’t put your modules in a ../modules folder in the same repos. • Make them separate repositories, and lock SHAs/tags to avoid

surprises! • Don’t deeply nest modules - pass a module everything it needs

• Code • type/region-environment layout

• vpc/uswest1-prod/subnets.tf • web_frontend/uswest1-prod/webs.tf • terraform.tfvars

12

Code layout

• Build your VPC, subnets etc with terraform • Export as remote state • Pull in elsewhere - eliminate magic numbers • Much nicer solution than getvariables.rb

13

Remote state

14

“remote” state

• nsone is an awesome DNS service! • They have a fantastic API

• I wrote my own Terraform provider! • github.com/bobtfish/terraform-provider-nsone • Tie together resources from multiple regions using remote state!

15

nsone

16

nsone

github.com/Yelp/terraform-provider-gitfile

• Checkout git repository • Generate a file from a template • Commit + Push

puppet/modules/zookeeper_cluster/data/cluster/xxxxx.yaml

17

gitfile

• Puppet code: class { ‘role::elasticsearch_cluster’:

cluster_name => ’reviews’,

}

• Hiera lookups: puppet/modules/elasticsearch_cluster/data/cluster/reviews.yaml

• Can locate the ‘data’ directory somewhere else

18

puppet data as modules

• Spot fleet Terraform provider in use internally • ‘Coming soon’ to github

19

Spot fleet

• puppet/modules/elasticsearch_clusters/data/cluster/reviews.yaml • Move the cluster data folder out of puppet • Add YAML for mapping of region/environment/number of nodes • Generate terraform config (as JSON)

• Simple config • Directly creating ASGs • No modules • Easy to debug!

• Automated cluster provisioning! (Just add Jenkins)

20

Managing Elasticsearch/Cassandra etc

• Bad abstraction for contextual information • Which db server is the master? Does it have ‘master’ in it’s FQDN? • If it does, what happens when you promote another machine?

• Need key => value for cattle not pets

• Customize your monitoring system to actually tell you what’s wrong! • ‘The master DB has crashed’ vs ‘A db has crashed’ • ‘10-46-11-54 is dead’ vs ‘zookeeper::10-46-11-54 is dead`

21

Hostnames

• Smartstack • Nerve (on host, monitors services) • Synapse (run a haproxy on lo:0) • Hacheck (cache healthcheck results to rate limit) • qdisc_tools (seamless haproxy reloads)

• yocalhost: 169.254.255.254 • Reachable from the machine • Reachable from inside Docker • Each service has a fixed port

22

Service discovery

• Terraform is really, really young. • It has some serious issues and limitations currently

23

The bad news

• Terraform is really, really young. • It has some serious issues and limitations currently

24

The bad news

The good news• It’s moving really fast • None of the things needed fundamentally change the model

• Unfortunately, provider aliases don’t work in terraform modules • We want to provision all ‘prod’ ES clusters in one shot • So we just generate raw terraform resources, without using a module

• Works, but it’d be nice to have more separation • ‘Make all the Elasticsearch clusters’ • ‘Make an individual Elasticsearch cluster’

• Should be separate concerns IMO

25

Multi region

• "Terraform is really hard to debug” • Modules make this 10x worse. • TF_LOG=1 is useful for provider authors. • NOT useful for Terraform users

26

Debugging

output “thing_ids” { value = “${join(“,”, aws_instance.foo.*.id)}”

}

${split(“,” module.foo.thing)}

const stringListDelim = `B780FFEC-B661-4EB8-9236-A01737AD98B6`

27

Data structures

• Lots of corner cases where they don’t work. • Some cases where they work sometimes

28

Counts and Interpolation

• Don’t try to put your domain logic into Terraform! • Write some (simple!) classes for your domain • Make them serialize out to Terraform resources in JSON • Done!

29

KISS

• 0.7 will fix some of my biggest complaints

• Ability to move state • Enables refactoring existing resources into modules

• Complex data structure support • No more split() join()

30

Terraform 0.7

• Twitter: @bobtfish • IRC: #terraform (t0m) • github.com/bobtfish • github.com/Yelp • github.com/terraform-community-modules

31

Thanks


Recommended