+ All Categories
Home > Technology > Fabric, Cuisine and Watchdog for server administration in Python

Fabric, Cuisine and Watchdog for server administration in Python

Date post: 08-Sep-2014
Category:
Upload: ffunction-inc
View: 47,989 times
Download: 0 times
Share this document with a friend
Description:
Presents Fabric, Cuisine and Watchdog, three Python tools that will help you setup, administer and monitor your servers.
Popular Tags:
145
ffunction inc. Fabric, Cuisine & Watchdog Sébastien Pierre, ffunction inc. @Montréal Python, February 2011 www.ffctn.com
Transcript
Page 1: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Fabric, Cuisine & Watchdog

Sébastien Pierre, ffunction inc.@Montréal Python, February 2011

www.ffctn.com

Page 2: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

How to use Python for

Server AdministrationThanks to

FabricCuisine*

& Watchdog**custom tools

Page 3: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The way we useservers

has changed

Page 4: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

WEBSERVER

The era of dedicated servers

DATABASESERVER

EMAILSERVER

Hosted in your server room or in colocation

Page 5: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

WEBSERVER

The era of dedicated servers

DATABASESERVER

EMAILSERVER

Hosted in your server room or in colocation

Sysadmins typicallySSH and configure

the servers live

Sysadmins typicallySSH and configure

the servers live

Page 6: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

WEBSERVER

The era of dedicated servers

DATABASESERVER

EMAILSERVER

Hosted in your server room or in colocation

The servers areconservatively managed,

updates are risky

The servers areconservatively managed,

updates are risky

Page 7: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

We now have multiplesmall virtual servers

(slices/VPS)

We now have multiplesmall virtual servers

(slices/VPS)

Page 8: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

Often located in differentdata-centers

Often located in differentdata-centers

Page 9: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

...and sometimes withdifferent providers

...and sometimes withdifferent providers

Page 10: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

DEDICATEDSERVER 1

DEDICATEDSERVER 2

IWeb.com

We even sometimesstill have physical,dedicated servers

We even sometimesstill have physical,dedicated servers

Page 11: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

DEPLOYAPPLICATION

Page 12: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

DEPLOYAPPLICATION

MAKE THIS PROCESS AS FAST (AND SIMPLE)AS POSSIBLE

Page 13: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

DEPLOYAPPLICATION

MAKE THIS PROCESS AS FAST (AND SIMPLE)AS POSSIBLE

Create users, groupsCustomize config filesInstall base packages

Create users, groupsCustomize config filesInstall base packages

Page 14: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

DEPLOYAPPLICATION

MAKE THIS PROCESS AS FAST (AND SIMPLE)AS POSSIBLE

Install app-specificpackages

deploy applicationstart services

Install app-specificpackages

deploy applicationstart services

Page 15: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The challenge

Page 16: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The challenge

Quickly integrate yournew server in the

existing architecture

Quickly integrate yournew server in the

existing architecture

Page 17: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The challenge ...and make sureit's running!

...and make sureit's running!

Page 18: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Today's menu

FABRIC

CUISINE

WATCHDOG

Interact with your remote machinesas if they were local

Takes care of users, group, packagesand configuration of your new machine

Ensures that your servers and servicesare up and running

Page 19: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Today's menu

FABRIC

CUISINE

WATCHDOG

Interact with your remote machinesas if they were local

Takes care of users, group, packagesand configuration of your new machine

Ensures that your servers and servicesare up and running

Made byMade by

Page 20: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Part 1

Fabric - http://fabfile.org

application deployment & systems administration tasks

Page 21: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Fabric is a Python library and command-line tool

for streamlining the use of SSHfor application deployment

or systems administration tasks.

Page 22: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Fabric is a Python library and command-line tool

for streamlining the use of SSHfor application deployment

or systems administration tasks.

Wait... what doesthat mean ?

Wait... what doesthat mean ?

Page 23: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version'”).read()

version = run(“cat /proc/version”)

By hand:

Using Fabric:

Page 24: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

Page 25: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

You can specify multiple hosts and runthe same commands

across them

You can specify multiple hosts and runthe same commands

across them

Page 26: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

Connections will belazily created and

pooled

Connections will belazily created and

pooled

Page 27: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

Failures ($STATUS) willbe handled just like in Make

Failures ($STATUS) willbe handled just like in Make

Page 28: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Installing packages

sudo(“aptitude install nginx”)

if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1:

sudo("aptitude install '%s'" % (package)

Page 29: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Installing packages

sudo(“aptitude install nginx”)

if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1:

sudo("aptitude install '%s'" % (package)

It's easy to take actiondepending on the result

It's easy to take actiondepending on the result

Page 30: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Installing packages

sudo(“aptitude install nginx”)

if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1:

sudo("aptitude install '%s'" % (package)

Note that we add trueso that the run() always

succeeds** there are other ways...

Note that we add trueso that the run() always

succeeds** there are other ways...

Page 31: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: retrieving system status

disk_usage = run(“df -kP”)mem_usage = run(“cat /proc/meminfo”)cpu_usage = run(“cat /proc/stat”

print disk_usage, mem_usage, cpu_info

Page 32: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: retrieving system status

disk_usage = run(“df -kP”)mem_usage = run(“cat /proc/meminfo”)cpu_usage = run(“cat /proc/stat”

print disk_usage, mem_usage, cpu_info

Very useful for gettinglive information from

many different servers

Very useful for gettinglive information from

many different servers

Page 33: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Fabfile.py

from fabric.api import *from mysetup import *

env.host = [“server1.myapp.com”]

def setup(): install_packages(“...”) update_configuration() create_users() start_daemons()

$ fab setup

Page 34: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Fabfile.py

from fabric.api import *from mysetup import *

env.host = [“server1.myapp.com”]

def setup(): install_packages(“...”) update_configuration() create_users() start_daemons()

$ fab setup

Just like Make, youwrite rules that do

something

Just like Make, youwrite rules that do

something

Page 35: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Fabfile.py

from fabric.api import *from mysetup import *

env.host = [“server1.myapp.com”]

def setup(): install_packages(“...”) update_configuration() create_users() start_daemons()

$ fab setup

...and you can specifyon which servers the rules

will run

...and you can specifyon which servers the rules

will run

Page 36: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Multiple hosts

@hosts(“db1.myapp”)def backup_db():

run(...)

env.hosts = [“db1.myapp.com”,“db2.myapp.com”,“db3.myapp.com”

]

Page 37: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Roles

$ fab -R web setup

env.roledefs = { 'web': ['www1', 'www2', 'www3'], 'dns': ['ns1', 'ns2']}

Page 38: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Roles

$ fab -R web setup

env.roledefs = { 'web': ['www1', 'www2', 'www3'], 'dns': ['ns1', 'ns2']}

Will run the setup ruleonly on hosts members

of the web role.

Will run the setup ruleonly on hosts members

of the web role.

Page 39: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What's good about Fabric?

Low-levelBasically an ssh() command that returns the result

Simple primitivesrun(), sudo(), get(), put(), local(), prompt(), reboot()

No magicNo DSL, no abstraction, just a remote command API

Page 40: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What could be improved ?

Ease common admin tasksUser, group creation. Files, directory operations.

Abstract primitivesLike install package, so that it works with different OS

TemplatesTo make creating/updating configuration files easy

Page 41: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine:Chef-like functionality for Fabric

Page 42: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Part 2

Cuisine

Page 43: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What is Opscode's Chef?

RecipesScripts/packages to install and configure services and applications

APIA DSL-like Ruby API to interact with the OS (create users, groups, install packages, etc)

ArchitectureClient-server or “solo” mode to push and deploy your new configurations

http://wiki.opscode.com/display/chef/Home

Page 44: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I liked about Chef

FlexibleYou can use the API or shell commands

StructuredHelped me have a clear decomposition of the services installed per machine

CommunityLots of recipes already available from http://cookbooks.opscode.com/

Page 45: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I didn't like

Too many files and directoriesCode is spread out, hard to get the big picture

Abstraction overloadAPI not very well documented, frequent fall backs to plain shell scripts within the recipe

No “smart” recipeRecipes are applied all the time, even when it's not necessary

Page 46: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The question that kept coming...

Django recipe: 5 files, 2 directories

sudo aptitude install apache2 python django-python

What it does, in essence

Page 47: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The question that kept coming...

Django recipe: 5 files, 2 directories

sudo aptitude install apache2 python django-python

What it does, in essence

Is this really necessaryfor what I want to do ?

Is this really necessaryfor what I want to do ?

Page 48: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I loved about Fabric

Bare metalssh() function, simple and elegant set of primitives

No magicNo abstraction, no model, no compilation

Two-way communicationEasy to change the rule's behaviour according to the output (ex: do not install something that's already installed)

Page 49: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I needed

Fabric

Page 50: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I needed

Fabric

File I/OFile I/O

Page 51: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I needed

Fabric

File I/OFile I/O User/GroupManagement

User/GroupManagement

Page 52: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I needed

Fabric

File I/OFile I/O PackageManagement

PackageManagement

User/GroupManagement

User/GroupManagement

Page 53: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

What I needed

Fabric

File I/OFile I/O PackageManagement

PackageManagement

User/GroupManagement

User/GroupManagement

Text processing & TemplatesText processing & Templates

Page 54: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

How I wanted it

Simple “flat” API[object]_[operation] where operation is something in “create”, “read”, “update”, “write”, “remove”, “ensure”, etc...

Driven by needOnly implement a feature if I have a real need for it

No magicEverything is implemented using sh-compatible commands

No unnecessary structureEverything fits in one file, no imposed file layout

Page 55: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():package_ensure(“python”, “apache2”, “python-django”)user_ensure(“admin”, uid=2000)upstart_ensure(“django”)

$ fab setup

Page 56: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():package_ensure(“python”, “apache2”, “python-django”)user_ensure(“admin”, uid=2000)upstart_ensure(“django”)

$ fab setup

Fabric's core functionsare already imported

Fabric's core functionsare already imported

Page 57: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():package_ensure(“python”, “apache2”, “python-django”)user_ensure(“admin”, uid=2000)upstart_ensure(“django”)

$ fab setup Cuisine's APIcalls

Cuisine's APIcalls

Page 58: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

File I/O

Page 59: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine : File I/O

● file_exists does remote file exists?● file_read reads remote file● file_write write data to remote file● file_append appends data to remote file● file_attribs chmod & chown● file_remove

Page 60: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine : File I/O

● file_exists does remote file exists?● file_read reads remote file● file_write write data to remote file● file_append appends data to remote file● file_attribs chmod & chown● file_remove

Supports owner/groupand mode change

Supports owner/groupand mode change

Page 61: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine : File I/O (directories)

● dir_exists does remote file exists?● dir_ensure ensures that a directory exists● dir_attribs chmod & chown● dir_remove

Page 62: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine : File I/O +

● file_update(location, updater=lambda _:_)

package_ensure("mongodb-snapshot")def update_configuration( text ): res = [] for line in text.split("\n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "\n".join(res)file_update("/etc/mongodb.conf", update_configuration)

Page 63: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine : File I/O +

● file_update(location, updater=lambda _:_)

package_ensure("mongodb-snapshot")def update_configuration( text ): res = [] for line in text.split("\n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "\n".join(res)file_update("/etc/mongodb.conf", update_configuration)

This replaces the values forconfiguration entriesdbpath and logpath

This replaces the values forconfiguration entriesdbpath and logpath

Page 64: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine : File I/O +

● file_update(location, updater=lambda _:_)

package_ensure("mongodb-snapshot")def update_configuration( text ): res = [] for line in text.split("\n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "\n".join(res)file_update("/etc/mongodb.conf", update_configuration)

The remote file will only bechanged if the content

is different

The remote file will only bechanged if the content

is different

Page 65: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

User Management

Page 66: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: User Management

● user_exists does the user exists?● user_create create the user● user_ensure create the user if it doesn't exist

Page 67: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Group Management

● group_exists does the group exists?● group_create create the group● group_ensure create the group if it doesn't exist● group_user_exists does the user belong to the group?● group_user_add adds the user to the group● group_user_ensure

Page 68: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Package Management

Page 69: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Package Management

● package_exists is the package available ?● package_installed is it installed ?● package_install install the package● package_ensure ... only if it's not installed● package_upgrade upgrades the/all package(s)

Page 70: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Text & Templates

Page 71: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_ensure_line(text, lines)

file_update("/home/user/.profile", lambda _:text_ensure_line(_,

"PYTHONPATH=/opt/lib/python:${PYTHONPATH};""export PYTHONPATH"

))

Page 72: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_ensure_line(text, lines)

file_update("/home/user/.profile", lambda _:text_ensure_line(_,

"PYTHONPATH=/opt/lib/python:${PYTHONPATH};""export PYTHONPATH"

))

Ensures that the PYTHONPATHvariable is set and exported,

If not, these lines will beappended.

Ensures that the PYTHONPATHvariable is set and exported,

If not, these lines will beappended.

Page 73: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_replace_line(text, old, new, find=.., process=...)

configuration = local_read("server.conf")for key, value in variables.items():

configuration, replaced = text_replace_line(configuration,key + "=",key + "=" + repr(value),process=lambda text:text.split("=")[0].strip()

)

Page 74: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_replace_line(text, old, new, find=.., process=...)

configuration = local_read("server.conf")for key, value in variables.items():

configuration, replaced = text_replace_line(configuration,key + "=",key + "=" + repr(value),process=lambda text:text.split("=")[0].strip()

)

Replaces lines that look likeVARIABLE=VALUE

with the actual values from thevariables dictionary.

Replaces lines that look likeVARIABLE=VALUE

with the actual values from thevariables dictionary.

Page 75: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_replace_line(text, old, new, find=.., process=...)

configuration = local_read("server.conf")for key, value in variables.items():

configuration, replaced = text_replace_line(configuration,key + "=",key + "=" + repr(value),process=lambda text:text.split("=")[0].strip()

)

The process lambda transformsinput lines before comparing

them.

Here the lines are strippedof spaces and of their value.

The process lambda transformsinput lines before comparing

them.

Here the lines are strippedof spaces and of their value.

Page 76: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_strip_margin(text)

file_write(".profile", text_strip_margin("""|export PATH="$HOME/bin":$PATH|set -o vi"""

))

Page 77: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_strip_margin(text)

file_write(".profile", text_strip_margin("""|export PATH="$HOME/bin":$PATH|set -o vi"""

))

Everything after the | separatorwill be output as content.

It allows to easily embed texttemplates within functions.

Everything after the | separatorwill be output as content.

It allows to easily embed texttemplates within functions.

Page 78: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_template(text, variables)

text_template(text_strip_margin("""|cd ${DAEMON_PATH}|exec ${DAEMON_EXEC_PATH}"""

), dict(DAEMON_PATH="/opt/mongodb",DAEMON_EXEC_PATH="/opt/mongodb/mongod"

))

Page 79: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Text transformation

text_template(text, variables)

text_template(text_strip_margin("""|cd ${DAEMON_PATH}|exec ${DAEMON_EXEC_PATH}"""

), dict(DAEMON_PATH="/opt/mongodb",DAEMON_EXEC_PATH="/opt/mongodb/mongod"

))

This is a simple wrapperaround Python (safe)

string.template() function

This is a simple wrapperaround Python (safe)

string.template() function

Page 80: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Cuisine: Goodies

● ssh_keygen generates DSA keys

● ssh_authorize authorizes your key on the remote server

● mode_sudo run() always uses sudo

● upstart_ensure ensures the given daemon is running

& more!

Page 81: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Why use Cuisine ?

● Simple API for remote-server manipulationFiles, users, groups, packages

● Shell commands for specific tasks onlyAvoid problems with your shell commands by only using run() for very specific tasks

● Cuisine tasks are not stupid*_ensure() commands won't do anything if it's not necessary

Page 82: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Limitations

● Limited to sh-shellsOperations will not work under csh

● Only written/tested for Ubuntu LinuxContributors could easily port commands

Page 83: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Get started !

On Github:http://github.com/sebastien/cuisine

1 short Python fileDocumented API

Page 84: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Part 3

Watchdog

Server and services monitoring

Page 85: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The problem

Page 86: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The problem

Low disk spaceLow disk space

Page 87: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The problem

Archive filesRotate logs

Purge cache

Archive filesRotate logs

Purge cache

Page 88: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The problem HTTP serverhas highlatency

HTTP serverhas highlatency

Page 89: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The problemRestart HTTP

server

Restart HTTPserver

Page 90: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The problem

System loadis too high

System loadis too high

Page 91: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The problem

re-niceimportantprocesses

re-niceimportantprocesses

Page 92: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

We want to be notifiedwhen incidents happen

Page 93: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

We want automatic actions to be taken whenever possible

Page 94: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

(Some of the) existing solutions

Monit, God, Supervisord, UpstartFocus on starting/restarting daemons and services

Munin, CactiFocus on visualization of RRDTool data

CollectdFocus on collecting and publishing data

Page 95: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

The ideal tool

Wide spectrumData collection, service monitoring, actions

Easy setup and deploymentNo complex installation or configuration

Flexible server architectureCan monitor local or remote processes

Customizable and extensibleFrom restarting deamons to monitoring whole servers

Page 96: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

SERVICE

Page 97: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

Page 98: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

A service is acollection of

RULES

A service is acollection of

RULES

Page 99: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

Page 100: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

Each rule retrievesdata and processes it.Rules can SUCCEED

or FAIL

Each rule retrievesdata and processes it.Rules can SUCCEED

or FAIL

Page 101: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

RULE

ACTION

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

Page 102: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

RULE

ACTION

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

LoggingXMPP, Email notificationsStart/stop process….

Page 103: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Hello, Watchdog!

RULE

ACTION

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

LoggingXMPP, Email notificationsStart/stop process….

Actions are boundto rule, triggeredon rule SUCCESS

or FAILURE

Actions are boundto rule, triggeredon rule SUCCESS

or FAILURE

Page 104: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Execution Model

MONITOR

Page 105: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

SERVICE DEFINITION

Page 106: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

SERVICE DEFINITION

Services are registeredin the monitor

Services are registeredin the monitor

Page 107: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

SERVICE DEFINITION

Rules defined in theservice are executed

every N ms(frequency)

Rules defined in theservice are executed

every N ms(frequency)

Rules defined in theservice are executed

every N ms(frequency)

Rules defined in theservice are executed

every N ms(frequency)

Page 108: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

ACTION

ACTION

ACTION

SERVICE DEFINITION

SUCCESS FAILURE

Page 109: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

ACTION

ACTION

ACTION

SERVICE DEFINITION

If the rule SUCCEEDSactions will be

sequentially executed

If the rule SUCCEEDSactions will be

sequentially executed

SUCCESS FAILURE

Page 110: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

ACTION

ACTION

ACTION

SERVICE DEFINITION

If the rule FAILfailure actions will besequentially executed

If the rule FAILfailure actions will besequentially executed

SUCCESS FAILURE

Page 111: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

Page 112: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

A monitor is like the“main” for Watchdog.

It actively monitorsservices.

A monitor is like the“main” for Watchdog.

It actively monitorsservices.

Page 113: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

Don't forget to callrun() on it

Don't forget to callrun() on it

Page 114: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

The service monitorsthe rules

The service monitorsthe rules

Page 115: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

The HTTP ruleallows to test

an URL

The HTTP ruleallows to test

an URL

And we display amessage in case

of failure

And we display amessage in case

of failure

Page 116: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

If it there is a 4XX orit timeouts, the rulewill fail and displayan error message

If it there is a 4XX orit timeouts, the rulewill fail and displayan error message

Page 117: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring a remote machine

$ python example-service-monitoring.py

2011-02-27T22:33:18 watchdog --- #0 (runners=1,threads=2,duration=0.57s)2011-02-27T22:33:18 watchdog [!] Failure on HTTP(GET="www.google.ca:80/search?q=watchdog",timeout=0.08) : Socket error: timed outGoogle search query took more than 50ms2011-02-27T22:33:19 watchdog --- #1 (runners=1,threads=2,duration=0.73s)2011-02-27T22:33:20 watchdog --- #2 (runners=1,threads=2,duration=0.54s)2011-02-27T22:33:21 watchdog --- #3 (runners=1,threads=2,duration=0.69s)2011-02-27T22:33:22 watchdog --- #4 (runners=1,threads=2,duration=0.77s)2011-02-27T22:33:23 watchdog --- #5 (runners=1,threads=2,duration=0.70s)

Page 118: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Sending Email Notification

send_email = Email("[email protected]","[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email]

)

Page 119: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Sending Email Notification

send_email = Email("[email protected]","[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email]

)

The Email rule will sendan email to

[email protected] triggered

The Email rule will sendan email to

[email protected] triggered

Page 120: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Sending Email Notification

send_email = Email("[email protected]","[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email]

)

This is how we bind theaction to the rule failure

This is how we bind theaction to the rule failure

Page 121: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Sending Email+Jabber Notification

send_xmpp = XMPP("[email protected]","Watchdog: Google search latency over 80ms","[email protected]", "myspassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email, send_xmpp]

)

Page 122: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring incident: when something fails repeatedly during a given period of

time

Page 123: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring incident: when something fails repeatedly during a given period of

time

You don't want to benotified all the time,only when it really

matters.

You don't want to benotified all the time,only when it really

matters.

Page 124: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

Page 125: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

An incident is a “smart”action : it will only dosomething when the

condition is met

An incident is a “smart”action : it will only dosomething when the

condition is met

Page 126: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

When at least 5 errors...When at least 5 errors...

Page 127: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

...happen over a 10seconds period

...happen over a 10seconds period

Page 128: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

The Incident action willtrigger the given actions

The Incident action willtrigger the given actions

Page 129: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

Page 130: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

We test if we canGET http://localhost:8000

within 500ms

We test if we canGET http://localhost:8000

within 500ms

Page 131: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

If we can't reach it during5 seconds

If we can't reach it during5 seconds

Page 132: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

We kill and restartmyservice-start.py

We kill and restartmyservice-start.py

Page 133: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Example: Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

Page 134: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

Page 135: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

SystemInfo will retrievesystem information andreturn it as a dictionary

SystemInfo will retrievesystem information andreturn it as a dictionary

Page 136: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

We log each result byextracting the given

value from the resultdictionary (memoryUsage,

diskUsage,cpuUsage)

We log each result byextracting the given

value from the resultdictionary (memoryUsage,

diskUsage,cpuUsage)

Page 137: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

Bandwidth collectsnetwork interface

live traffic information

Bandwidth collectsnetwork interface

live traffic information

Page 138: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

But we don't want thetotal amount, we justwant the difference.Delta does just that.

But we don't want thetotal amount, we justwant the difference.Delta does just that.

Page 139: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent=")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

We print the resultas before

We print the resultas before

Page 140: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent=")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

SystemHealth willfail whenever the usage

is above the giventhresholds

SystemHealth willfail whenever the usage

is above the giventhresholds

Page 141: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent=")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

We'll log failuresin a log file

We'll log failuresin a log file

Page 142: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Watchdog: Overview

Monitoring DSLDeclarative programming to define monitoring strategy

Wide spectrumFrom data collection to incident detection

FlexibleDoes not impose a specific architecture

Page 143: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Watchdog: Use cases

Ensure service availabilityTest and stop/restart when problems

Collect system statisticsLog or send data through the network

Alert on system or service healthTake actions when the system stats is above threshold

Page 144: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Get started !

On Github:http://github.com/sebastien/watchdog

1 Python fileDocumented API

Page 145: Fabric, Cuisine and Watchdog for server administration in Python

ffunctioninc.

Merci !

[email protected]/sebastien


Recommended