Date post: | 14-Apr-2017 |
Category: |
Technology |
Upload: | london-hashicorp-user-group |
View: | 301 times |
Download: | 2 times |
• Scaleable web capacity • Scaleable load balancer capacity • Scalable service capacity • Scaleable, repeatable, self service Elasticsearch and Cassandra • Provisioning a new prod datacenter cut from 12+ months to 2. • Auto scaling, using spot capacity • Operations and Infrastructure teams:
• More efficient • More agile • Providing more business value
2
Key Results
• Wrap Terraform up to make it harder to screw up • Makefile - easiest path - make plan / make apply
• Use allowed_account_ids in providers (AWS specifically)
6
Lesson 1:
• External Node Classifer • Puppetmaster calls a script, returns node definition • Create node definition from EC2 tags
puppet::role::elasticsearch_cluster => cluster_name=reviews
• Stop needing individual hostnames!
8
puppet ENC
• My small contributions to the community • github.com/terraform-community-modules/tf_aws_ubuntu_ami • github.com/terraform-community-modules/
tf_aws_availability_zones
• Modules you can reuse to stop having to hard code IDs • make + getvariables.rb pattern • We have internal versions at Yelp (usually to bake in our
variables.tf.json)9
I hate magic numbers
• Standup update from a coworker: • Yesterday: “Learned Go” • Today: “Implemented yelpaws_instance”
• Adds “ubuntu” and “region” + “account” variables to aws_instance • Looks up the AMI to use automatically • Only on initial launch, puppet converges machines after that! • https://github.com/Yelp/terraform-ami_fromhttp
10
yelpaws_instance
• Modules • Don’t put your modules in a ../modules folder in the same repos. • Make them separate repositories, and lock SHAs/tags to avoid
surprises! • Don’t deeply nest modules - pass a module everything it needs
• Code • type/region-environment layout
• vpc/uswest1-prod/subnets.tf • web_frontend/uswest1-prod/webs.tf • terraform.tfvars
12
Code layout
• Build your VPC, subnets etc with terraform • Export as remote state • Pull in elsewhere - eliminate magic numbers • Much nicer solution than getvariables.rb
13
Remote state
• nsone is an awesome DNS service! • They have a fantastic API
• I wrote my own Terraform provider! • github.com/bobtfish/terraform-provider-nsone • Tie together resources from multiple regions using remote state!
15
nsone
github.com/Yelp/terraform-provider-gitfile
• Checkout git repository • Generate a file from a template • Commit + Push
puppet/modules/zookeeper_cluster/data/cluster/xxxxx.yaml
17
gitfile
• Puppet code: class { ‘role::elasticsearch_cluster’:
cluster_name => ’reviews’,
}
• Hiera lookups: puppet/modules/elasticsearch_cluster/data/cluster/reviews.yaml
• Can locate the ‘data’ directory somewhere else
18
puppet data as modules
• puppet/modules/elasticsearch_clusters/data/cluster/reviews.yaml • Move the cluster data folder out of puppet • Add YAML for mapping of region/environment/number of nodes • Generate terraform config (as JSON)
• Simple config • Directly creating ASGs • No modules • Easy to debug!
• Automated cluster provisioning! (Just add Jenkins)
20
Managing Elasticsearch/Cassandra etc
• Bad abstraction for contextual information • Which db server is the master? Does it have ‘master’ in it’s FQDN? • If it does, what happens when you promote another machine?
• Need key => value for cattle not pets
• Customize your monitoring system to actually tell you what’s wrong! • ‘The master DB has crashed’ vs ‘A db has crashed’ • ‘10-46-11-54 is dead’ vs ‘zookeeper::10-46-11-54 is dead`
21
Hostnames
• Smartstack • Nerve (on host, monitors services) • Synapse (run a haproxy on lo:0) • Hacheck (cache healthcheck results to rate limit) • qdisc_tools (seamless haproxy reloads)
• yocalhost: 169.254.255.254 • Reachable from the machine • Reachable from inside Docker • Each service has a fixed port
22
Service discovery
• Terraform is really, really young. • It has some serious issues and limitations currently
23
The bad news
• Terraform is really, really young. • It has some serious issues and limitations currently
24
The bad news
The good news• It’s moving really fast • None of the things needed fundamentally change the model
• Unfortunately, provider aliases don’t work in terraform modules • We want to provision all ‘prod’ ES clusters in one shot • So we just generate raw terraform resources, without using a module
• Works, but it’d be nice to have more separation • ‘Make all the Elasticsearch clusters’ • ‘Make an individual Elasticsearch cluster’
• Should be separate concerns IMO
25
Multi region
• "Terraform is really hard to debug” • Modules make this 10x worse. • TF_LOG=1 is useful for provider authors. • NOT useful for Terraform users
26
Debugging
output “thing_ids” { value = “${join(“,”, aws_instance.foo.*.id)}”
}
${split(“,” module.foo.thing)}
const stringListDelim = `B780FFEC-B661-4EB8-9236-A01737AD98B6`
27
Data structures
• Lots of corner cases where they don’t work. • Some cases where they work sometimes
28
Counts and Interpolation
• Don’t try to put your domain logic into Terraform! • Write some (simple!) classes for your domain • Make them serialize out to Terraform resources in JSON • Done!
29
KISS
• 0.7 will fix some of my biggest complaints
• Ability to move state • Enables refactoring existing resources into modules
• Complex data structure support • No more split() join()
30
Terraform 0.7
• Twitter: @bobtfish • IRC: #terraform (t0m) • github.com/bobtfish • github.com/Yelp • github.com/terraform-community-modules
31
Thanks