+ All Categories
Home > Engineering > systemd @ Facebook -- a year later

systemd @ Facebook -- a year later

Date post: 23-Jan-2018
Category:
Upload: davide-cavalca
View: 92 times
Download: 6 times
Share this document with a friend
30
Transcript
Page 1: systemd @ Facebook -- a year later
Page 2: systemd @ Facebook -- a year later

systemd @ FB – a year later

Davide CavalcaProduction Engineer

Page 3: systemd @ Facebook -- a year later

• Recap• Tracking upstream• Resource management• Service monitoring• Case studies• Advocacy

Agenda

Page 4: systemd @ Facebook -- a year later

Recap

Page 5: systemd @ Facebook -- a year later
Page 6: systemd @ Facebook -- a year later

• 100% of the bare metal feet on CentOS 7!• Migrated countless services to systemd• libsystemd integration in our build system• Containers: see Zeal’s talk later today!

RecapCentOS 7 migration

Page 7: systemd @ Facebook -- a year later

Tracking upstream

Page 8: systemd @ Facebook -- a year later

• systemd 231 232 233 (234 235)→ → → →• Also tracking util-linux, dbus, etc.• Published our Rawhide-based backports on:

https://github.com/facebookincubator/rpm-backports• Binary RPMs based on it on:

https://copr.fedorainfracloud.org/coprs/jsynacek/systemd-backports-for-centos-7/

Tracking upstreamStaying up to date

Page 9: systemd @ Facebook -- a year later

• Not specifc to systemd• Duplicate systemd RPMs: package-cleanup wrapper• rpmdb corruption: dcrpm• Mismatch between systemd and systemd-libs

Tracking upstreamRPM issues

if ldd /usr/lib/systemd/systemd | grep ‘systemd.*not found$’ yum reinstall -y $systemd_packagesfi

Page 10: systemd @ Facebook -- a year later

• Rebuild packaging for the Meson transition• Backported meson, ninja-build in CentOS• Standalone systemd-compat-libs

https://github.com/facebookincubator/systemd-compat-libs

Tracking upstreamMeson and compat-libs

Page 11: systemd @ Facebook -- a year later

Tracking upstreamtty woes with 234

• When rolling 234 we discovered a race in the kernel tty subsystem (repros all the way back to 4.0)• Turns out both systemd and Tupperware use the real tty0• Investigation still in progress, likely a use-after-free bug• Tupperware should probably just use a pty here

Page 12: systemd @ Facebook -- a year later

Resource management

Page 13: systemd @ Facebook -- a year later

• See Chris’s talk tomorrow for all things cgroup2!• Using systemd to partition services and apply limits• Lightweight daemon to collect metrics from /sys/fs/cgroup• Chef API to apply confgurations and manage experiments

Resource managementRolling out cgroup2

Page 14: systemd @ Facebook -- a year later

Resource managementSlice hierarchy

/||-system.slice||-workload.slice| || +-critical-wdb.slice|+-tbd.slice

Page 15: systemd @ Facebook -- a year later

Service monitoring

Page 16: systemd @ Facebook -- a year later

Service monitoring

• systemd exposes lots of useful metrics over dbus• Unit properties (e.g. *Timestamp*, NRestarts)• Status events (e.g. unit state changes)• Options: python-dbus, sd-bus, coreos/go-systemd/dbus

Getting metrics out of systemd

Page 17: systemd @ Facebook -- a year later

Service monitoring

• Lightweight daemon to feed systemd metrics to various monitoring systems• Polling for unit properties, subscriptions for status events• Initial implementation in golang

systemdmon

Page 18: systemd @ Facebook -- a year later

Service monitoring

• Thin Cython wrapper on top of sd-bus• Expose systemd dbus object model• ipython REPL for prototyping• Will be opensourced together with systemdmon

pystemd

Page 19: systemd @ Facebook -- a year later

Case studies

Page 20: systemd @ Facebook -- a year later

Case studiesdbus reliability

• Issues with dbus-daemon or the system bus afect systemd• systemctl hanging or failing Chef failing→• Easy to DoS the bus, especially with user services• Hard to remediate without a reboot

• Looking forward to dbus-broker!

Page 21: systemd @ Facebook -- a year later

Case studiesrpm macros for systemd services

• By default RPM macros will restart units on upgrade...• …which is a problem if you’ve also setup Chef to restart

• Solution: knob in our internal packaging tool to optionally disable the restart macro

Page 22: systemd @ Facebook -- a year later

Case studiesLogging

• Journald setup: 10MB in memory logging feeding rsyslog• journalctl is awesome• Double writing problem• No way to set per-unit limits

Page 23: systemd @ Facebook -- a year later

Case studiesUnit loops

• Easy to create loops with x-systemd-requires in fstab• systemd will delete a random unit to break loops

• Solution: add _netdev to the fstab entry• systemd-analyze to help debugging

systemd-tmpfiles-setup.service: Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with smc_proxy.service/start

Page 24: systemd @ Facebook -- a year later

Case studiesTransient unit creep

• systemd-run creates units in /run/systemd/transient• If the unit fails, it sticks around in ‘failed’ state• 10k failed units 50% cpu usage for pid 1→• 30k failed units 100% cpu usage for pid 1→• Fix: call systemctl reset-failed periodically

Page 25: systemd @ Facebook -- a year later

Case studiesKillMode=process

• KillMode=process may leave stray processes in the cgroup• Changes to unit slices don’t apply unless the old slice is

empty• Fix: move to use KillMode=control-group

Page 26: systemd @ Facebook -- a year later

Case studiesUnit escaping

• Escape logic relies on shell control characters:/dev/dm0 dev-dm\x2d1.swap→• Chef fx: https://github.com/chef/chef/pull/6230• path_to_unit wrapper in fb_systemd

Page 27: systemd @ Facebook -- a year later

Advocacy

Page 28: systemd @ Facebook -- a year later

• Announce core packages updates widely• Tailor documentation to customer usecases• Encourage people to engage upstream directly• Tech talks

Advocacy

Page 29: systemd @ Facebook -- a year later

Questions?

Page 30: systemd @ Facebook -- a year later

Recommended