+ All Categories
Home > Documents > 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam)...

'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam)...

Date post: 10-Oct-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
26
Docker operation pitfalls A Datadog Brownbag 1 / 21
Transcript
Page 1: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Docker operationpitfalls

A Datadog Brownbag

1 / 21

Page 2: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

AgendaSharing tips and pitfalls from my experience running test clusters and supporting users

What's really a container?What should I monitor?

CPUMemoryNetworkDiskOrchestrator stats

Docker operation pitfalls 2 / 21

Page 3: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

What's a container?It's a processAnd its subprocessesIsolated from the rest of the systemWith containerization technologies

Docker operation pitfalls 3 / 21

Page 4: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Docker operation pitfalls 3 / 21

Page 5: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Regular process treesystemd─┬─systemd───(sd-pam)

├─systemd-journal

├─systemd-logind

├─systemd-udevd

├─ ...

├─redis-server───2*[{redis-server}]

├─ ...

├─nginx───4*[nginx]

├─ ...

Docker operation pitfalls 4 / 21

Page 6: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Dockerized process treesystemd─┬─systemd───(sd-pam)

├─systemd-journal

├─systemd-logind

├─systemd-udevd

├─ ...

├─dockerd─┬─docker-containerd─┬─redis-server───3*[{redis-server}]

│ │ └─9*[{docker-containerd-shim}]

│ ├─ ...

│ └─docker-containerd─┬─nginx───nginx

│ └─9*[{docker-containerd-shim}]

├─ ...

Docker operation pitfalls 5 / 21

Page 7: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Cgroups for ressource allocation

cpu cpuacctmemoryblkionet_prio$ ls /sys/fs/cgroup/ for more

So, what's the difference?

Docker operation pitfalls 6 / 21

Page 8: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Cgroups for ressource allocation

cpu cpuacctmemoryblkionet_prio$ ls /sys/fs/cgroup/ for more

Namespaces for isolation

mntpidnetuser$ ls /proc/self/ns for more

So, what's the difference?

Docker operation pitfalls 6 / 21

Page 9: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Cgroups for ressource allocation

cpu cpuacctmemoryblkionet_prio$ ls /sys/fs/cgroup/ for more

Namespaces for isolation

mntpidnetuser$ ls /proc/self/ns for more

So, what's the difference?

These can be applied to regular processes

Docker operation pitfalls 6 / 21

Page 10: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

cpu cgroup & CPU metricsLimit tells how much CPU cores a cgroup can useIf limit exceeded, cgroup is frozen for a scheduling cycle ❄

Pretty unintrusive for your applicationCan impact your application's performanceMust monitor docker.cpu.throttled to see if that happens

Docker operation pitfalls 7 / 21

Page 11: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Let's test itversion: '2.2'

services:

cpuburn-nolimit:

image: alpine:3.6

command: "dd if=/dev/zero of=/dev/null"

cpuburn-10percent:

image: alpine:3.6

command: "dd if=/dev/zero of=/dev/null"

cpus: 0.1

Docker operation pitfalls 8 / 21

Page 12: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Docker operation pitfalls 9 / 21

Page 13: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

mem cgroup & memory metricsLimit tells how much RAM / RAM+swap a cgroup can useIf limit exceeded, ...

Docker operation pitfalls 10 / 21

Page 14: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

mem cgroup & memory metricsLimit tells how much RAM / RAM+swap a cgroup can useIf limit exceeded, oomkiller descends upon your cgroup Can kill the PID 1 (which leads to a OOM exit)Or not... which can leave the container stuck in a non-working state A common issue with docker-dd-agent (forwarder killed but collector still running)Must pre-emptively monitor docker.mem.in_use and docker.mem.sw_in_useto see if that could happen

Docker operation pitfalls 11 / 21

Page 15: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Let's test itservices:

memleak-pid1:

image: alpine:3.6

command: "ash -c 'for i in `seq 1 10000000`; do true; done'"

mem_limit: 10000000

restart: on-failure

memleak-forked:

image: alpine:3.6

command: "ash -c \"ash -c 'for i in `seq 1 10000000`; do true; done'

& sleep 20\""

cpus: 0.1

mem_limit: 10000000

restart: on-failure

Docker operation pitfalls 12 / 21

Page 16: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

ubuntu@ci-xaviervello:~$ docker-compose -f composes/memlimit.compose up

memleak-pid1_1 | Killed

composes_memleak-pid1_1 exited with code 137

memleak-pid1_1 | Killed

...

memleak-forked_1 | Killed

memleak-pid1_1 | Killed

...

composes_memleak-forked_1 exited with code 0 ✅composes_memleak-pid1_1 exited with code 137

What should I do?Run only one program per container (or use a robust supervisor)Have a relevant healthcheck

Docker operation pitfalls 13 / 21

Page 17: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

net namespace & net metricsEvery container has their own network namespaceTheir own virtual eth0 that has a private IP, bridged by the hostAllows isolation and routingAllows us to collect per-container metrics

$ docker exec agent5_agent5-release_1 cat /host/proc/30828/net/dev

Inter-| Receive

face |bytes packets errs drop fifo frame compressed multicast

lo: 6961740 50619 0 0 0 0 0 0

eth0: 19558170 37932 0 0 0 0 0 0

Docker operation pitfalls 14 / 21

Page 18: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

$ cat /proc/net/dev

Inter-| Receive

face |bytes packets errs drop fifo frame compressed multicast

enp0s8: 61509004 104768 0 0 0 0 0 0

enp0s3: 523131054 862084 0 0 0 0 0 0

lo: 2952 46 0 0 0 0 0 0

veth4642e27: 6763827 33666 0 0 0 0 0 0

...

$ docker exec agent5_agent5-release_1 cat /proc/net/dev

Inter-| Receive

face |bytes packets errs drop fifo frame compressed multicast

lo: 6656783 48403 0 0 0 0 0 0

eth0: 18699348 36269 0 0 0 0 0 0

Docker operation pitfalls 15 / 21

Page 19: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

$ docker exec agent5_agent5-release_1 cat /host/proc/net/dev

Docker operation pitfalls 16 / 21

Page 20: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

$ docker exec agent5_agent5-release_1 cat /host/proc/net/dev

Inter-| Receive

face |bytes packets errs drop fifo frame compressed multicast

lo: 6697512 48702 0 0 0 0 0 0

eth0: 18812503 36489 0 0 0 0 0 0

Docker operation pitfalls 16 / 21

Page 21: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

$ docker exec agent5_agent5-release_1 cat /host/proc/net/dev

Inter-| Receive

face |bytes packets errs drop fifo frame compressed multicast

lo: 6697512 48702 0 0 0 0 0 0

eth0: 18812503 36489 0 0 0 0 0 0

How to get the host's metrics?Run docker-dd-agent with net=host

Docker operation pitfalls 16 / 21

Page 22: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Investigating using /host/proc/1/net/dev in agent6$ docker exec agent5_agent5-release_1 cat /proc/1/net/dev

Inter-| Receive

face |bytes packets errs drop fifo frame compressed multicast

lo: 6882233 50038 0 0 0 0 0 0

eth0: 19332873 37499 0 0 0 0 0 0

$ docker exec agent5_agent5-release_1 cat /host/proc/1/net/dev

Inter-| Receive

face |bytes packets errs drop fifo frame compressed multicast

enp0s8: 61509280 104771 0 0 0 0 0 0

enp0s3: 523996193 864400 0 0 0 0 0 0

lo: 2952 46 0 0 0 0 0 0

veth4642e27: 7017324 34939 0 0 0 0 0 0

Docker operation pitfalls 17 / 21

Page 23: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

mnt namespace & disk metricsBy default, containers don't have access to the host's filesystem (duh!)Free space is a property of the filesystem, not the block deviceTo get metrics on a filesystem, you need to bind it in the container, see KB

Please monitor free space in /var/lib/docker!$ docker system df

TYPE TOTAL ACTIVE SIZE RECLAIMABLE

Images 51 6 4.52GB 3.946GB (87%)

Containers 13 2 125MB 73.74MB (58%)

Local Volumes 47 3 1.05kB 1.05kB (100%)

Docker operation pitfalls 18 / 21

Page 24: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Orchestrator metricshttps://www.datadoghq.com/blog/monitor-kubernetes-docker/mesos-master and mesos-slavekubernetes, kubernetes_state

Alert on high level objects (deployments, daemonsets, tasks...),drill down to the container-level for investigation

Docker operation pitfalls 19 / 21

Page 25: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Questions?

Docker operation pitfalls 20 / 21

Page 26: 'RFNHURSHUDWLRQ SLWIDOOV · Regular process tree systemd─┬─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-udevd ├─ ... ├─redis-server───2*[{redis-server}]

Thanks!Get it at github.com/xvello/decks/tree/master/201711-datadog-brownbag/pdf

Next weekAaditya TalwaiTeam RacletteAPM intake/storage/query pipeline

Docker operation pitfalls 21 / 21


Recommended