+ All Categories
Home > Technology > Effective service and resource management with systemd

Effective service and resource management with systemd

Date post: 21-Jan-2017
Category:
Upload: david-timothy-strauss
View: 2,005 times
Download: 3 times
Share this document with a friend
30
Effective Service + Resource Management with systemd Adventures running millions of systemd services for
Transcript
Page 1: Effective service and resource management with systemd

Effective Service + Resource Management with systemd

Adventures running millions of systemd services for

Page 2: Effective service and resource management with systemd

About Me and Pantheon

● Production users

of systemd since 2011

● Millions of units in

deployment across hundreds

of servers

● Committer since 2012

● Focus has been on journal

logging, control group

scalability, and general

systemd scalability

Page 3: Effective service and resource management with systemd

The Basic Steps

1 Define expected behavior and control

2 Plan for the unexpected

3 Tighten security

4 Manage, monitor, and automate

Page 4: Effective service and resource management with systemd

Service Types

1 Define expected behavior and control

Page 5: Effective service and resource management with systemd

Type=simple (the default)

systemctl start foo.service systemctl stop foo.service

ExecStart=/usr/bin/foo

/etc/systemd/system/foo.service

Considered started for dependencies

Considered stopped for dependencies

[Service]ExecStart=/usr/bin/foo

# systemctl daemon-reload

Page 6: Effective service and resource management with systemd

Type=oneshot

systemctl start foo.service systemctl stop foo.service

*Unless RemainAfterExit=true

*ExecStart=/usr/bin/foo

[Service]Type=oneshotExecStart=/usr/bin/fooRuntimeMaxSec=30

/etc/systemd/system/foo.service

RuntimeMaxSec=30

Page 7: Effective service and resource management with systemd

Type=forkingsystemctl start foo.service

systemctl stop foo.service

ExecStart...

PIDFile=/var/run/foo.pid

[Service]Type=forkingExecStart=/usr/bin/fooPIDFile=/var/run/foo.pidTimeoutStartSec=30

/etc/systemd/system/foo.service

TimeoutStartSec=30

Page 8: Effective service and resource management with systemd

Type=notify

systemctl start foo.service systemctl stop foo.service

ExecStart...

[Service]Type=notifyExecStart=/usr/bin/fooTimeoutStartSec=30NotifyAccess=all ⬅maybe

/etc/systemd/system/foo.service

Called from daemon:systemd-notify --ready

Best of All

Types

Page 9: Effective service and resource management with systemd

Service Shutdown and Reloading

1 Define expected behavior and control

Page 10: Effective service and resource management with systemd

KillMode=control-group (the default)

systemctl stop foo.service

[Service]ExecStart=/usr/bin/fooKillMode=control-groupTimeoutStopSec=30

/etc/systemd/system/foo.service

PID=100

101

102

103

…or “Oprah’s Favorite Signals”

SIGTERM

PID=100

101

102

103

SIGKILL

TimeoutStopSec=30

Page 11: Effective service and resource management with systemd

KillMode=none

systemctl stop foo.service

[Service]ExecStart=/usr/bin/fooKillMode=noneExecStop=/usr/bin/fooctl stop

/etc/systemd/system/foo.service

PID=100

101

102

103

PID=100

101

102

103

No CleanupExecStop=/usr/bin/fooctl stop

Page 12: Effective service and resource management with systemd

KillMode=process

systemctl stop foo.service

[Service]ExecStart=/usr/bin/fooKillMode=process

/etc/systemd/system/foo.service

PID=100

101

102

103

SIGTERM PID=100

101

102

103

No Cleanup

Page 13: Effective service and resource management with systemd

KillMode=mixed

systemctl stop foo.service

[Service]ExecStart=/usr/bin/fooKillMode=mixedTimeoutStopSec=30

/etc/systemd/system/foo.service

PID=100

101

102

103

SIGTERM PID=100

101

102

103

SIGKILL

TimeoutStopSec=30

Best for

Most

Page 14: Effective service and resource management with systemd

ExecReload=

systemctl reload foo.service

[Service]ExecStart=/usr/bin/fooExecReload=/bin/kill -HUP $MAINPID

/etc/systemd/system/foo.service

Use Me

ExecReload=/bin/kill -HUP $MAINPID

Page 15: Effective service and resource management with systemd

Dependencies and Transactions

1 Define expected behavior and control

Page 16: Effective service and resource management with systemd

WantedBy=

Implicit in late bootup:systemctl start multi-user.target

[Service]ExecStart=/usr/bin/foo

[Install]WantedBy=multi-user.target

/etc/systemd/system/foo.service

Use Me

# systemctl enable foo.service

Added to transaction by wants:systemctl start foo.service

multi-user.target completes startup

Operations in systemd happen in transactions, which are ordered sets of jobs.

…the successor to runlevels

Page 17: Effective service and resource management with systemd

Other DependenciesInclusion

These dependencies will add more units to a

transaction. There is no effect on ordering.

● Requires=bar.service

○ If foo.service is starting, starting bar.service

will also happen. A failure to start bar.service

will cause the entire transaction to fail.

○ Inverse of RequiredBy=

● Wants=bar.service

○ A weak form of Requires=. If bar.service fails

to start, the transaction will still succeed.

○ Inverse of WantedBy=

● Also=bar.service

○ When foo.service is enabled to start by

default, bar.service will also be enabled.

Ordering

These dependencies will order units in the

transaction. They will not add specified units if

they are not already in the transaction.

● Before=bar.service

○ If bar.service is in the same transaction, bar.

service will not begin starting until foo.

service is finished starting.

● After=bar.service

○ If bar.service is in the same transaction, foo.

service will not begin starting until bar.

service is finished starting.

[Unit]Requires=bar.serviceAfter=bar.service...

/etc/systemd/system/foo.service

Page 18: Effective service and resource management with systemd

Controlling Resources

1 Define expected behavior and control

Page 19: Effective service and resource management with systemd

Control Groups Options for ResourcesAbsolute Limits

● MemoryLimit=

○ Caution: Certain limits cause further

allocation for a group to use swap, impacting

system performance.

● TasksMax=

○ Maximum combined processes and threads,

including kernel threads.

● BlockIOReadBandwidth=

○ Limits reading block I/O to the specified

bytes

per second.

● BlockIOWriteBandwidth=

○ Limits writing block I/O to the specified

bytes

per second.

Relative Controls and More

● CPUShares=

○ When under contention, CPU is allocated by

the kernel proportionally using the number

for this service versus the combined shares of

all others.

● BlockIOWeight=

○ When under contention, block I/O is

allocated by the kernel proportionally using

the number for this service versus the

combined weights of all others.

● nftables for network traffic

○ Not configured in systemd, but nftables can

leverage systemd’s control groups for traffic

shaping and other rules.

Page 20: Effective service and resource management with systemd

Using Traditional ulimit/rlimit Options● CPU

○ LimitCPU=

○ LimitNPROC=

○ LimitRTPRIO=

○ LimitRTTIME=

○ LimitNICE=

● Disk

○ LimitCORE=

● Memory

○ LimitDATA=

○ LimitFSIZE=

○ LimitSTACK=

○ LimitMSGQUEUE=

○ LimitAS=

○ LimitRSS=

○ LimitMEMLOCK=

● Other

○ LimitSIGPENDING=

○ LimitNOFILE=

○ LimitLOCKS=

Page 21: Effective service and resource management with systemd

Handling Timeouts and Abnormal Exits

2 Plan for the unexpected

Page 22: Effective service and resource management with systemd

Directives for Detecting and Responding to FailureDetecting Failure

● SuccessExitStatus=

○ Whitelist of exit codes and signals to indicate a

normal exit. Defaults to zero and the usual process

signals for healthy processes.

● RestartPreventExitStatus=

○ Blacklist of exit codes and signals to not trigger

restarts. Useful to restart on most failures but not

unrecoverable ones like a bad configuration.

● RestartForceExitStatus=○ The opposite of the previous option.

● StartLimitInterval= and StartLimitBurst=

○ Thresholds at which attempted failure recovery

becomes a stickier failure.

Responding to Failure

● Restart=

○ Allows many options, but on-failure is

probably best for most cases.

● FailureAction=

○ Supports options like rebooting or shutting

down the system on service failure.

● StartLimitAction=

○ Same as FailureAction= but triggered when

StartLimit… thresholds get hit.

● systemctl reset-failed○ Resets status units marked as failed.

Page 23: Effective service and resource management with systemd

Built-In Service Monitoring with WatchdogServices

● WatchdogSec=

○ Configures the maximum interval for the

healthy service to ping systemd.

● $WATCHDOG_USEC and $WATCHDOG_PID

○ Environmental variables set for a service that

is expected to provide systemd with

watchdog pings.

● systemd-notify WATCHDOG=1

○ CLI; the most basic way for a service to send

systemd a watchdog ping.

● sd_notify(0, “WATCHDOG=1”);

○ A better way that requires linking to a

systemd library.

Overall System

● RuntimeWatchdogSec=

○ Configures the maximum interval for

systemd to ping the hardware watchdog

service (if it exists). If the hardware fails to

receive an expected ping, it will reboot the

system.

● ShutdownWatchdogSec=

○ Bounds the time the watchdog hardware is

willing to wait for a clean shutdown for the

triggered reboot.

Page 24: Effective service and resource management with systemd

Dropping Privileges and Access Early

3 Tighten security

Page 25: Effective service and resource management with systemd

Dropping Privileges and Access Early● Hardening options that mostly just work

○ User=<service-user>

○ PrivateTmp=true

○ PrivateDevices=true

○ ProtectSystem=full

○ ProtectHome=read-only

○ NoNewPrivileges=true

○ MountFlags=private

○ SystemCallArchitectures=native

○ SecureBits=noroot noroot-locked

● Restrict visible directories

○ ReadWriteDirectories=

○ ReadOnlyDirectories=

○ InaccessibleDirectories=

○ RootDirectory=

runs the service in chroot

● Whitelist capabilities and system calls

○ AmbientCapabilities=

○ CapabilityBoundingSet=

○ SystemCallFilter=

○ SystemCallErrorNumber=EPERM

tests filters in a non-enforcing mode

● Control sockets

○ RestrictAddressFamilies=

○ PrivateNetwork=true, which is best

combined with socket activation

● Bridge to mandatory access control (MAC)

○ SELinuxContext=

○ AppArmorProfile=

○ SmackProcessLabel=

Page 26: Effective service and resource management with systemd

Monitoring

4 Manage, monitor, and automate

Page 27: Effective service and resource management with systemd

Monitor at the Box LevelPlug a systemctl call into your monitoring tool:

# systemctl --state=failed --all

0 loaded units listed.

To show all installed unit files use 'systemctl list-unit-files'.

Page 28: Effective service and resource management with systemd

Automation

4 Manage, monitor, and automate

Page 29: Effective service and resource management with systemd

Pantheon is a Chef Shoptemplate '/etc/systemd/system/foo.service' do

mode '0644'

source 'foo.service.erb'

end

service 'foo.service' do

provider Chef::Provider::Service::Systemd

supports :status => true, :restart => true, :reload => true

action [ :enable, :start ]

end

Page 30: Effective service and resource management with systemd

Questions? Follow Ups?Reach out to me @DavidStrauss.

Want to get more hands-on? We’re hiring!

pantheon.io/careers


Recommended