Running a production Jenkins instance
Harpreet Singh, Senior Director, Product Management
Kohsuke KawaguchiJenkins founder
©2012 CloudBees, Inc. All Rights Reserved
2
• Failures – a fact of life– Getting ready for failures– Preventing failures– Debugging failures
• Run an efficient Jenkins installation
Agenda
©2012 CloudBees, Inc. All Rights Reserved
3©2011 CloudBees, Inc. All Rights Reserved
Day: A period of 24 hours, mostly misspent…
4©2011 CloudBees, Inc. All Rights Reserved
5
• Jenkins founder on-board• Key Jenkins contributors on-board• Built Jenkins as a Service• Run the biggest Jenkins installation
anywhere (2k+) masters
CloudBees – Who are we?
©2011 CloudBees, Inc. All Rights Reserved
6
• Eliminate time wasted due to– Jenkins issues– User issues– Lack of right tools…
• Improve efficiency for administrators and developers
• Rely on Jenkins…
CloudBees’ Mission - Eliminate Downtime
©2011 CloudBees, Inc. All Rights Reserved
7
• Organize jobs better• Secure your jobs• Replicate good practices• Respond quicker to requests• Ensure compliance• Bounce back from failures• Prevent failures• Everything should be as fast as
possible…if not faster
Good Management of Jenkins
©2011 CloudBees, Inc. All Rights Reserved
Recovering from failuresHigh Availability, Backing up
©2011 CloudBees, Inc. All Rights Reserved
9
Problem: Disk Failures
• JENKINS_HOME– Plugins, users, jobs…
everything
Jenkins Enterprise Solution
• Backup plugin
• Backup-to-cloud
Backing up Jenkins
©2011 CloudBees, Inc. All Rights Reserved
Solution: Back it up
• Push HOME to a repo– HOME tends to be
large– Commit only vital info– Run nightly
• Push to S3
10
• Backup as a Jenkins job
• What to backup– Job configuration– Build records– System
Configuration• Plugin binaries,
plugin configs etc• Everything except
job
• Where to backup– Local Directory– Sftp server– WebDav
• Retention Policy– All– Last N– Exponential decay
JE Backup Plugin
©2011 CloudBees, Inc. All Rights Reserved
11
Demo
©2011 CloudBees, Inc. All Rights Reserved
12
Problem: Jenkins failures• Machine/Jenkins failure has
high cost to productivity
Jenkins Enterprise Solution• Highly Available
– Setup multiple Jenkins masters
– Uses jgroups to elect a primary master
– Promotes a backup master as primary
Making Jenkins Highly Available
©2011 CloudBees, Inc. All Rights Reserved
Solution: Notified by unhappy customers ;-)• Issues:
– Receive emails from unhappy customers and log in and fix it
• You do have JENKINS_HOME backed up else where – don’t you?
13©2011 CloudBees, Inc. All Rights Reserved
Bounce Back Faster: High Availability
JENKINS_HOME
Jenkins Cluster
Jenkins Master
Jenkins Master
Reverse Proxy
MT
JENKINS_HOMENFS
Jenkins Cluster
Jenkins Master
Reverse Proxy
MT
14©2011 CloudBees, Inc. All Rights Reserved
Demo
15
• Jenkins is not just JENKINS_HOME…think about the slaves– Offload builds onto slaves – Other executables on the system: git, ruby, java etc as
well– Preferably use Chef/Puppet to replicate installations
• What about geo redundancy?– Technically you can use HA but network latency comes
in play– Ideally, use HA in a localized data center and a manual
failover to a different geo• What HA is not?
– Does not load balance between instances
Miscellaneous
©2011 CloudBees, Inc. All Rights Reserved
Preventing failuresGit Validated Merges plugin
©2011 CloudBees, Inc. All Rights Reserved
17
How can you delegate more to Jenkins?• Does your CI server shift work from
laptops to servers?– You need to commit to have Jenkins
test it– But if your commit is bad, it blocks
others– You end up testing locally before
committing– FAIL
18
Motivation
• We want to make changes safely– Your mistake shouldn’t block others– Only push after changes are validated
• We want to run tests asynchronously– Your brain has more important things to do–Make change and move on– Even with TDD!
• We want to run tests on the server– Your laptop has more important things to
do
19
Solution: Jenkins should be Git server• I push to Jenkins• Jenkins merges it with upstream• Jenkins tests it• If good, Jenkins pushes it upstream
upstreamrepo
gate repo
20
Another way to look at it
Tip of master in upstream
My changes
Tip of master in upstream
21
Implementation
• Transport– HTTP– SSH
• JGit embedded in Jenkins for git server functionality– A bit of magic like Gerrit to make it
seamless
• Additional tags to let you pull submitted changes
22
Demo
©2011 CloudBees, Inc. All Rights Reserved
Running an efficient production system
©2011 CloudBees, Inc. All Rights Reserved
24
• Run mini 2nd instance– Test new core version before putting it
to prod– Test new versions of plugins– Play with new plugins
• Copy over some jobs from prod
• Bootstrap dry-run– -
Djenkins.model.Jenkins.killAfterLoad=true
Test Instance
©2011 CloudBees, Inc. All Rights Reserved
25
• Fast archiver plugin– Conserve network bandwidth
• No build on master– Also good for security
Configuring Jenkins for efficiency
©2011 CloudBees, Inc. All Rights Reserved
26
Problem: Discovering what plugins are used in an installation
• No visibility if a particular plugin is used or how many jobs use it
Jenkins Enterprise Solution• Plugin Usage Plugin
– Tabular view of Plugin name, # of jobs and the job names using the plugin
Managing and Pruning Plugins
©2011 CloudBees, Inc. All Rights Reserved
27
Demo
©2011 CloudBees, Inc. All Rights Reserved
Monitoring Jenkins
©2012 CloudBees, Inc. All Rights Reserved
29
Why?
©2011 CloudBees, Inc. All Rights Reserved
30
• What the user sees– GUI (load time)
• JVM memory size– Beware of several independent pieces
• System load• Free space on $JENKINS_HOME• Slave availability• Queue length
What?
©2011 CloudBees, Inc. All Rights Reserved
31
Groovy Console
$ cat queue.groovyj=Jenkins.instancesprintln j.queue.items.length
$ curl –u "user:apiToken“ \ –data-urlencode [email protected] \ http://jenkins/scriptText13
32
Remote API
$ curl http://jenkins/computer/api/json?pretty=true{ busyExecutors: 0, totalExecutors: 2, ...}
33
• JavaMelody in Jenkins
Jenkins Monitoring plugin
©2011 CloudBees, Inc. All Rights Reserved
34
• Server app for monitoringstuff– Extensible, allowing all sorts
of things to be monitored
• Used in jenkins-ci.org/DEV@cloud
Nagios (or others like it)
©2011 CloudBees, Inc. All Rights Reserved
35©2011 CloudBees, Inc. All Rights Reserved
36
• Tells us where Jenkins is stuck• When?– Hang or slowness
• Look for threads that’s stuck– HTTP request threads– Executor threads
Thread dump
©2011 CloudBees, Inc. All Rights Reserved
37
• http://jenkins/threadDump• kill -3 <PID>
How to get a thread dump
©2011 CloudBees, Inc. All Rights Reserved
38
• Tells us what’s eating memory• When?– OutOfMemoryError–Monitoring shows abnormal growth
• Look for objects that are big– Sessions– Classes from plugins
Heap dump
©2011 CloudBees, Inc. All Rights Reserved
39
• curl –L http://jenkins/heapDump > dump.hprof
• jmap -dump:format=b,file=dump.hprof PID
• -XX:+HeapDumpOnOutOfMemoryError
How to get a memory dump
©2011 CloudBees, Inc. All Rights Reserved
40
©2011 CloudBees, Inc. All Rights Reserved
More Info
Free Trial
Wiki Page
User Guide
http://www.cloudbees.com/jenkins-enterprise-by-cloudbees-overview.cb
http://www.cloudbees.com/jenkins-enterprise-by-cloudbees-download.cb
https://wiki.cloudbees.com/bin/view/Jenkins+Enterprise/WebHome
http://jenkins-enterprise.cloudbees.com/docs/user-guide-bundle/index.html#
Thank You!
Wrapping up
41©2011 CloudBees, Inc. All Rights Reserved
Day: A period of 24 hours, mostly misspent…
©2012 CloudBees, Inc. All Rights Reserved