Docker pipelines

Post on 23-Jan-2017

207 views 1 download

transcript

1

Dev/QA/Ops Friendly Docker Pipeline

Chris Mague / Shokunin

12/13/2016

2

Today's Talk

The Goal The Problem The Stack The Process The Conclusion

3

Quote

“a problem well put is half solved.” ― John Dewey

4

The Goal

“We want to release more frequently”

5

The Goal – Restated as Solvable

Build a continuous delivery pipeline for the Trulia Mobile API that is usable for all stakeholders.

6

The Problem(s) – Dev Version

- my code works on the shared dev host, but not on prod- no real visibility into what is happening in prod- troubleshooting is difficult- the Ops team is not helpful

7

The Problem(s) – QA Version

- code tested in QA doesn’t work in prod- inability to test multiple builds at the same time- no shared language to bridge the Dev/Ops teams- the Ops team is not helpful

8

The Problem(s) – Ops Version

- Dev/Stage environments are inconsistent- Prod environment is un-reproducable- Files are copied around in prod- Incoming requests are difficult to parse

9

The Problems – Stated as Solvable

- Need to build a common language (culture)- Need to build a reproducable platform in all environments (tech)- Need to provide automation and visibility tools (tech/culture)

10

The Stack

11

Docker

- Build a reproducable/immutable(ish) platform- Control Application dependencies- Automated build capabilities- Low overhead compared to virtualization- Stateless application

12

Step 1 / Base Image

- Packer instead of Dockerfiles- Puppet to build container- Build on Jenkins- Vagrant option available- Tagged with latest- Pushed to our Docker registry

13

Step 2 / Develop Locally

- create separate run directories per environment

- modules per environment- consul_shared

14

Local Terraform

- Sets up the docker container- Sources variable- calls the shared keys- uses the run_locatoin

15

Run Location

- list of containers- mobileapi-base only is not

cached

16

Run Location

- Run supervisor- expose port 80 as 8080- link to dependencies- set env vars- mount volumes

17

Configuration

- done in consul- consul template to json- creates

/etc/trulia/<APPNAME>.json- separated by environment

18

Running

19

Step 3 / Kickoff

20

An aside on Jenkins

- Configure with Puppet- Install SCM Sync Plugin- Vanilla as possible- Configure with Puppet

21

${BUILD_NUMBER}

Jenkins provides several environment variables and the build number of the software packaging now becomes our shared key

22

Communication

QA to Dev - “tcd-mobileapi(container) build 12 failed to pass smoke tests can you please look at class foo”

QA to Ops - “tcd-mobileapi(container) build 12 went is having trouble connecting to the user database”

Ops to Dev - “after we rolled out tcd-mobileapi(container) build 12 we noticed the app_v1_userlookup(KPI) time doubled”

23

Pipeline - Package Software

- Spin up a build container- Mount the current directory- Pull in dependencies- Build a .deb with FPM- Push to aptly

24

25

Pipeline – Build Deployable Container

- Take base container- Install packaged software- Tag with build number- Upload to registry

26

Docker tags

Be SUPER careful with latest

When in doubt do not use

27

Pipeline – Run in QATCD

- Spin up container in our QATCD Nomad cluster- Run terraform to update all of the configurations in consul- Set up credentials using Vault- container is now available http://tcd-mobileapi-10.qatcd.example.com

28

Pipeline – Deploy Test

- health checks are crucial - needed for monitoring - needed for LB - needed for consul - get hit like 20 times/second- engineer came up with the idea of

deploy tests - only hit occasionally - more detailed - more resource heavy

29

Pipeline – Smoke test

- Calls another Jenkins server- Managed by the QA team- Detailed application level test

30

Pipeline - Repointer

- allows for static hostnames for applications or external testers

- does some checking

31

Pipeline – Next Steps

1) Preprod environment - Push configuration LIVE - Run a single container with the newer version - Other tests run - Build number is put in a Jenkins form and push button2) Release to Production - Put a build number in a Jenkins form - Only allowed if the build is on preprod - Containers are rolled out with sleep and concurrency set

32

33

Pipeline

Dev, QA and Ops teams keep an eye on KPIs and various dashboards

QED

34

Internals

35

Nomad

- Job scheduler- Not limited to Docker- Integrates with Consul- Easy setup- Sane configuration

36

Nomad Config

37

Traefik

- HAProxy restart issue- Performant- Easily templatable

configuration- Nice quick front end

38

39

Vault / Consul Template

- Easily generate config files from key/value store- Feature flags are easily implimented- Store and filter Database credentials

40

Logging

- Big challenge- All Apache/Nginx logs include APPNAME/BUILD_NUMBER

information and are in JSON format- Application logs are in JSON format and often include unique

IDs- Stacktraces are fingerprinted- Logstash picks up from the Nomad alloc dirs

41

42

43

44

Stats / KPIs

- Data is pulled from the logs and sent to statsd→influxdb with a Grafana front end

- Host and container level stats are picked up via cAdvisor

45

46

47

Troubleshooting

- Devs have exec access to all containers through Vault SSH

- This is audited- After completion of any activities the container is

terminated

48

No silver bullets...

- Unit tests are slow- Initial learning curve- Docker on anything other than Linux is painful- Apps need to be modified- Less control for devs compared to old method

49

Improvements

- Better troubleshooting tools- Shared docker host for apps with heavy upstream dependencies- More local services to make development easier- Better training/support for desktop Docker issues- More code libraries to handle common app issues

50

Thanks

Kevin - AppDynamics Sonal Joshi – Trulia Sr. Automation Engineer

Vincent Lam – Trulia Sr. Application Developer