Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... ·...

Post on 06-May-2018

221 views 2 download

transcript

See Docker from the Perspective of Linux Process

Allen Sun@DaoCloud Hangzhou Docker Meetup

2015.03.14

Agenda

1. Prerequisite

Linux Process (do_fork / copy_process )

Namespaces

2. How Docker deals process

dockerinit, ENTRYPOINT, CMD

syscall——fork() Process A

fork()

Process A continues

Process B

execev()

exit()

wait() ZOMBIE

SIGCHLD

clean up

Child - new PID

executes a different program !

Reference: http://www.lynx.com/the-fork-call-posix-processes-and-parent-child-relationships

Parent - original PID

do_fork do_fork

copy_process

determine PID

wake_up_new_task

wait_for_completion

copy_process

check flags

dup and init task_struct

check resource limit

copy/share process details

Reference:Mauerer W. Professional Linux kernel architecture[M] Figure 2-7 and Figure 2-8. John Wiley & Sons, 2010.

copy_semundo

copy_namespaces

……

set IDs, task relationships, etc.

……

struct nsproxy *nsproxy

struct task_struct

struct uts_namespace *uts_ns

struct nsproxy

struct mnt_namespace *mnt_ns

struct net *net_ns

struct uts_namespace

struct mnt_namespace

struct net

task_struct and namespaces

Nsproxy proxies 5 kinds of namespace for a process.

1.uts_namespace 2.mnt_namespace 3.pid_namespace 4.ipc_namespace 5.net

user_namespace is not in nsproxy! Based on Linux kernel 3.13

What is in namespaces? struct pid_namespace { … struct task_struct * child_reaper; … int level; struct pid_namespace *parent; };

struct mnt_namespace { atomic_t count; struct mount *root; struct list_head list; …… };

Based on Linux kernel 3.13

struct uts_namespace { struct kref kref; struct new_utsname name; struct user_namespace *user_ns; …… }

struct new_utsname { char sysname[..]; char nodename[..]; char release[..]; char version[..]; char machine[..]; char domainname[..]; }; ……

Docker? Where is Docker?

Docker Client

Docker Daemon

Docker Container

Docker Container

……

fork !

do_fork

copy_process

copy_namespaces

do_execve

Docker Container is born just by syscall fork and exec a process !

Difference (Docker’s fork vs normal fork)

Special flags used in syscall do_fork()

flag name Linux kernel version

CLONE_NEWNS 2.4.19

CLONE_NEWUTS 2.6.19

CLONE_NEWIPC 2.6.24

CLONE_NEWPID 2.6.24

CLONE_NEWNET 2.6.29

CLONE_NEWUSER 3.8

Namespaces in Docker func init() { namespaceList = Namespaces { {Key: "NEWNS", Value: syscall.CLONE_NEWNS, File: "mnt"}, {Key: "NEWUTS", Value: syscall.CLONE_NEWUTS, File: "uts"}, {Key: "NEWIPC", Value: syscall.CLONE_NEWIPC, File: "ipc"}, {Key: "NEWUSER", Value: syscall.CLONE_NEWUSER, File: "user"}, {Key: "NEWPID", Value: syscall.CLONE_NEWPID, File: "pid"}, {Key: "NEWNET", Value: syscall.CLONE_NEWNET, File: "net"}, } }

Based on libcontainer v1.2.0

USER_NAMESPACE: not fully implemented in Docker NET_NAMESPACE: not used in network mode “host” and ”other container”

What to Fork?

Docker Client

Docker Daemon

? ?

fork with flags!

…… Docker Container

fork Docker Container?

Docker Container == Process(es) ?

What Process to Fork?

Whatever! A process indeed.

Process is just forked, not execed yet.

Result is like below:

task_struct ready

namespaces ready

other resources ready

Process is still static, no program is running.

Then exec! exec what? Have you ever heard of

dockerinit, ENTRYPOINT or CMD in Docker?

name description

dockerinit init thing that first runs inside a new namespace to setup mount, net namespaces and other things.

ENTRYPOINT An ENTRYPOINT allows you to configure a container that will run as an executable

CMD The main purpose of a CMD is to provide defaults for an executing container.

Reference: https://docs.docker.com/reference/builder

Dockerinit, ENTRYPOINT, CMD

Docker Daemon

process

fork

exec

dockerinit ENTRYPOINT CMD

1. 2. 3.

new namespaces

init namespaces

the only process (same PID)

dockerinit

Docker Daemon and dockerinit

Docker Daemon

syncPipe

parent

child

Usage: coordnate the sequential of Docker Daemon and dockerinit.

Dockerinit will be blocked if nothing read in syncPipe.

Why ?

How to coordinate? Docker Daemon

dockerinit

1.Create Command The executable in container(dockerint)

2.Create syncPipe

3.Pass pipe to Child

4. command.start() Fork and exec the command

syncPipe(nothing) blocked

5. SetupCgroups syncPipe(nothing) blocked, controlled by cgroup

6. init network syncPipe(nothing) blocked, controlled by cgroup

7.Sync with Child syncPipe(has networkState) read from syncPipe

fork, new PID!

Based on libcontainer v1.2.0

How to coordinate? Docker Daemon dockerinit

1.SetupNetwork

2.SetupRoute

3.Init Mount ns

4.Apply apparmor

5.execv Entrypoint

Setup devices, mount points and fs

ENTRYPOINT exec, same PID!

exec, same PID! CMD

Finally, YOUR APP! 8.command.wait()

Based on libcontainer v1.2.0

x. execv Cmd

Docker Container

Docker Daemon

process

fork

exec

dockerinit ENTRYPOINT CMD (your application)

1. 2. 3.

new namespaces

init namespaces

the only process (same PID)

cgroups applied

Docker Container process process process

process

Why to Coordinate?

1. Docker Daemon needs to Synchronize with dockerinit.

block dockerinit so no children of dockerinit can escape from cgroups.

2. Can not switch namespace in Go runtime. blocked until Docker Daemon transfers network details that will be used

to setup network interface in newnet namespace.

Q&A

PRESENTATION TITLE

SPEAKER NAME

2014 / 12 /09

THANK YOU !

Email: allen.sun@daocloud.io weibo: @莲子弗如清 webchat: shlallen