+ All Categories
Home > Documents > Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... ·...

Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... ·...

Date post: 22-Jul-2020
Category:
Upload: others
View: 23 times
Download: 0 times
Share this document with a friend
120
http://ramirose.wix.com/ramirosen 1/121 Rami Rosen [email protected] Haifux, May 2013 www.haifux.org Resource management: Linux kernel Namespaces and cgroups
Transcript
Page 1: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen1/121

Rami Rosen

[email protected]

Haifux, May 2013

www.haifux.org

Resource management:

Linux kernel Namespaces and

cgroups

Page 2: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen2/121

TOC

PID namespaces

cgroups

Note: All code examples are from for_3_10 branch of cgroup git tree (3.9.0-rc1, April 2013)

links

Mounting cgroups

user namespaces

UTS namespace

Network Namespace

Mount namespace

Page 3: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen3/121

General

The presentation deals with two Linux process resource

management solutions: namespaces and cgroups.

We will look at:● Kernel Implementation details.●what was added/changed in brief. ● User space interface.● Some working examples.● Usage of namespaces and cgroups in other projects.● Is process virtualization indeed lightweight comparing to Os

virtualization ?●Comparing to VMWare/qemu/scaleMP or even to Xen/KVM.

Page 4: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen4/121

Namespaces

● Namespaces - lightweight process virtualization.

– Isolation: Enable a process (or several processes) to have different

views of the system than other processes.

– 1992: “The Use of Name Spaces in Plan 9”

– http://www.cs.bell-labs.com/sys/doc/names.html

● Rob Pike et al, ACM SIGOPS European Workshop 1992.

– Much like Zones in Solaris.

– No hypervisor layer (as in OS virtualization like KVM, Xen)

– Only one system call was added (setns())

– Used in Checkpoint/Restart

● Developers: Eric W. Biederman, Pavel Emelyanov, Al Viro, Cyrill Gorcunov, more.

Page 5: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen5/121

Namespaces - contd

There are currently 6 namespaces:

● mnt (mount points, filesystems)

● pid (processes)

● net (network stack)

● ipc (System V IPC)

● uts (hostname)

● user (UIDs)

Page 6: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen6/121

Namespaces - contd

It was intended that there will be 10 namespaces: the following 4

namespaces are not implemented (yet):

● security namespace

● security keys namespace

● device namespace

● time namespace.

– There was a time namespace patch – but it was not applied.

– See: PATCH 0/4 - Time virtualization:

– http://lwn.net/Articles/179825/

● see ols2006, "Multiple Instances of the Global Linux Namespaces" Eric

W. Biederman

Page 7: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen7/121

Namespaces - contd

● Mount namespaces were the first type of namespace to be

implemented on Linux by Al Viro, appearing in 2002.

– Linux 2.4.19.

● CLONE_NEWNS flag was added (stands for “new namespace”; at

that time, no other namespace was planned, so it was not called

new mount...)

● User namespace was the last to be implemented. A number of Linux

filesystems are not yet user-namespace aware

Page 8: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen8/121

Implementation details

●Implementation (partial):

- 6 CLONE_NEW * flags were added:

(include/linux/sched.h)

● These flags (or a combination of them) can be

used in clone() or unshare() syscalls to create a

namespace.●In setns(), the flags are optional.

Page 9: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen9/121

CLONE_NEWNS 2.4.19 CAP_SYS_ADMIN

CLONE_NEWUTS 2.6.19 CAP_SYS_ADMIN

CLONE_NEWIPC 2.6.19 CAP_SYS_ADMIN

CLONE_NEWPID 2.6.24 CAP_SYS_ADMIN

CLONE_NEWNET 2.6.29 CAP_SYS_ADMIN

CLONE_NEWUSER 3.8 No capability is required

Page 10: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen10/121

Implementation - contd

● Three system calls are used for namespaces:

● clone() - creates a new process and a new namespace; the

process is attached to the new namespace.

– Process creation and process termination methods, fork() and exit() methods,

were patched to handle the new namespace CLONE_NEW* flags.

● unshare() - does not create a new process; creates a new

namespace and attaches the current process to it.

– unshare() was added in 2005, but not for namespaces only, but also for security.

see “new system call, unshare” : http://lwn.net/Articles/135266/

● setns() - a new system call was added, for joining an existing

namespace.

Page 11: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen11/121

Nameless namespaces

From man (2) clone:

...

int clone(int (*fn)(void *), void *child_stack,

int flags, void *arg, ...

/* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );

...●Flags is the CLONE_* flags, including the namespaces

CLONE_NEW* flags. There are more than 20 flags in total.● See include/uapi/linux/sched.h

●There is no parameter of a namespace name.● How do we know if two processes are in the same namespace ?● Namespaces do not have names. ● Six entries (inodes) were added under /proc/<pid>/ns (one for

each namespace) (in kernel 3.8 and higher.)● Each namespace has a unique inode number.●This inode number of a each namespace is created when the namespace is created.

Page 12: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen12/121

Nameless namespaces

●ls -al /proc/<pid>/nslrwxrwxrwx 1 root root 0 Apr 24 17:29 ipc -> ipc:[4026531839]

lrwxrwxrwx 1 root root 0 Apr 24 17:29 mnt -> mnt:[4026531840]

lrwxrwxrwx 1 root root 0 Apr 24 17:29 net -> net:[4026531956]

lrwxrwxrwx 1 root root 0 Apr 24 17:29 pid -> pid:[4026531836]

lrwxrwxrwx 1 root root 0 Apr 24 17:29 user -> user:[4026531837]

lrwxrwxrwx 1 root root 0 Apr 24 17:29 uts -> uts:[4026531838]

You can use also readlink.

Page 13: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen13/121

Implementation - contd

● A member named nsproxy was added to the process descriptor

, struct task_struct.●A method named task_nsproxy(struct task_struct *tsk), to access

the nsproxy of a specified process. (include/linux/nsproxy.h)

● nsproxy includes 5 inner namespaces: ● uts_ns, ipc_ns, mnt_ns, pid_ns, net_ns;

Notice that user ns is missing in this list,

● it is a member of the credentials object (struct cred) which is a

member of the process descriptor, task_struct.

● There is an initial, default namespace for each namespace.

Page 14: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen14/121

Implementation - contd

● Kernel config items: CONFIG_NAMESPACES

CONFIG_UTS_NS

CONFIG_IPC_NS

CONFIG_USER_NS

CONFIG_PID_NS

CONFIG_NET_NS

● user space additions:● IPROUTE package ●some additions like ip netns add/ip netns del and more.●util-linux package●unshare util with support for all the 6 namespaces.●nsenter – a wrapper around setns().

Page 15: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen15/121

UTS namespace

● uts - (Unix timesharing)

– Very simple to implement.

Added a member named uts_ns (uts_namespace object) to the

nsproxy. process descriptor (task_struct)

nsproxy

uts_ns (uts_namespace object)

name (new_utsname object)

sysname

nodename

release

version

machine

domainname

new_utsname struct

Page 16: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen16/121

UTS namespace - contd

The old implementation of gethostname():

asmlinkage long sys_gethostname(char __user *name, int len)

{

...

if (copy_to_user(name, system_utsname.nodename, i))

... errno = -EFAULT;

}

(system_utsname is a global)

kernel/sys.c, Kernel v2.6.11.5

Page 17: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen17/121

UTS namespace - contdA Method called utsname() was added:

static inline struct new_utsname *utsname(void)

{

return &current->nsproxy->uts_ns->name;

}

The new implementation of gethostname():SYSCALL_DEFINE2(gethostname, char __user *, name, int, len)

{

struct new_utsname *u;

...

u = utsname();

if (copy_to_user(name, u->nodename, i))

errno = -EFAULT;

...

}

Similar approach in uname() and sethostname() syscalls.

Page 18: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen18/121

UTS namespace - Example

We have a machine where hostname is myoldhostname.

uname -n myoldhostname

unshare -u /bin/bashThis create a UTS namespace by unshare()

syscall and call execvp() for invoking bash.

Then:hostname mynewhostname

uname -nmynewhostname

Now from a different terminal we will run uname -n, and we will

see myoldhostname.

Page 19: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen19/121

UTS namespace - Example

nsexecnsexec is a package by Serge Hallyn; it consists of a

program called nsexec.c which creates tasks in new

namespaces (there are some more utils in it) by clone() or by

unshare() with fork().

https://launchpad.net/~serge-hallyn/+archive/nsexec

Again we have a machine where hostname is myoldhostname.

uname -n myoldhostname

Page 20: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen21/121

IPC namespaces

The same principle as uts , nothing

special, more code.

Added a member named ipc_ns

(ipc_namespace object) to the nsproxy.

●CONFIG_POSIX_MQUEUE or CONFIG_SYSVIPC must be set

Page 21: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen22/121

Network Namespaces

● A network namespace is logically another copy of the network stack,

with its own routes, firewall rules, and network devices.

● The network namespace is struct net. (defined in

include/net/net_namespace.h)

Struct net includes all network stack ingredients, like:

– Loopback device.

– SNMP stats. (netns_mib)

– All network tables:routing, neighboring, etc.

– All sockets

– /procfs and /sysfs entries.

Page 22: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen23/121

Implementations guidelines

• A network device belongs to exactly one network namespace.● Added to struct net_device structure: ● struct net *nd_net;

for the Network namespace this network device is inside.●Added a method: dev_net(const struct net_device *dev)to access the nd_net namespace of a network device.

• A socket belongs to exactly one network namespace.● Added sk_net to struct sock (also a pointer to struct net), for the

Network namespace this socket is inside.● Added sock_net() and sock_net_set() methods (get/set network

namespace of a socket)

Page 23: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen24/121

Network Namespaces - contd

● Added a system wide linked list of all namespaces: net_namespace_list, and a macro to traverse it (for_each_net())

● The initial network namespace, init_net (instance of struct net), includes

the loopback device and all physical devices, the networking tables, etc.

● Each newly created network namespace includes only the loopback device.

● There are no sockets in a newly created namespace:

netstat -nl

Active Internet connections (only servers)

Proto Recv-Q Send-Q Local Address Foreign Address State

Active UNIX domain sockets (only servers)

Proto RefCnt Flags Type State I-Node Path

Page 24: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen25/121

Example

● Create two namespaces, called "myns1" and "myns2":  

● ip netns add myns1

● ip netns add myns2

– (In fedora 18, ip netns is included in the iproute package).

● This triggers:

● creation of /var/run/netns/myns1,/var/run/netns/myns2 empty folders

● calling the unshare() system call with CLONE_NEWNET.

– unshare() does not trigger cloning of a process; it does create a new namespace (a network namespace, because of the CLONE_NEWNET flag).

● see netns_add() in ipnetns.c (iproute2)

Page 25: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen26/121

● You can use the file descriptor of /var/run/netns/myns1 with the setns() system call.

● From man 2 setns:

...

int setns(int fd, int nstype);

DESCRIPTION

Given a file descriptor referring to a namespace, reassociate the calling

thread with that namespace.

...

● In case you pass 0 as nstype, no check is done about the fd.

● In case you pass some nstype, like CLONE_NEWNET of CLONE_NEWUTS, the

method verifies that the specified nstype corresponds to the specified fd.

Page 26: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen27/121

Network Namespaces - delete

● You delete a namespace by:

● ip netns del myns1

– This unmounts and removes /var/run/netns/myns1

– see netns_delete() in ipnetns.c

– Will not delete a network namespace if there is one or more processes attached to it.

● Notice that after deleting a namespace, all its migratable network devices

are moved to the default network namespace;

● unmoveable devices (devices who have NETIF_F_NETNS_LOCAL in their

features) and virtual devices are not moved to the default network namespace.

● (The semantics of migratable network devices and unmoveable devices

are taken from default_device_exit() method, net/core/dev.c).

Page 27: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen28/121

NETIF_F_NETNS_LOCAL

● NETIF_F_NETNS_LOCAL ia a network device feature

– (a member of net_device struct, of type netdev_features_t)

● It is set for devices that are not allowed to move between network namespaces; sometime

these devices are named "local devices".

● Example for local devices (where NETIF_F_NETNS_LOCAL is set):

– Loopback, VXLAN, ppp, bridge.

– You can see it with ethtool (by ethtool -k, or ethtool –show-features)

– ethtool -k p2p1

netns-local: off [fixed]

For the loopback device:

ethtool -k lo

netns-local: on [fixed]

Page 28: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen29/121

VXLAN

● Virtual eXtensible Local Area Network.

● VXLAN is a standard protocol to transfer layer 2 Ethernet packets

over UDP.

● Why do we need it ?

● There are firewalls which block tunnels and allow, for example, only

TCP/UDP traffic.

● developed by Stephen Hemminger.

– drivers/net/vxlan.c

– IANA assigned port is 4789

– Linux default is 8472 (legacy)

Page 29: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen30/121

When trying to move a device with NETIF_F_NETNS_LOCAL flag, like

VXLAN, from one namespace to another, we will encounter an error:

ip link add myvxlan type vxlan id 1ip link set myvxlan netns myns1

We will get: RTNETLINK answers: Invalid argument

int dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat){

int err;

err = -EINVAL;if (dev->features & NETIF_F_NETNS_LOCAL)

goto out;...}

Page 30: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen31/121

● You list the network namespaces (which were added via “ ip netns

add”)

● ip netns list

– this simply reads the namespaces under:

/var/run/netns● You can find the pid (or list of pids) in a specified net namespace by:

– ip netns pids namespaceName

● You can find the net namespace of a specified pid by:

– ip/ip netns identify #pid

Page 31: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen32/121

You can monitor addition/removal of network

namespaces by:

ip netns monitor

- prints one line for each addition/removal event it sees

Page 32: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen33/121

● Assigning p2p1 interface to myns1 network namespace: 

● ip link set p2p1 netns myns1

– This triggers changing the network namespace of the net_device to “myns1”.

– It is handled by dev_change_net_namespace(), net/core/dev.c.

● Now, running:

● ip netns exec myns1 bash

● will transfer me to myns1 network namespaces; so if I will run there:

● ifconfig -a 

● I will see p2p1 (and the loopback device);

– Also under /sys/class/net, there will be only p2p1 and lo folders.

● But if I will open a new terminal and type ifconifg -a, I will not see

p2p1.

Page 33: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen34/121

● Also, when going to the second namespace by running:

● ip netns exec myns2 bash

● will transfer me to myns2 network namespace; but if we will run 

there:

● ifconfig -a

– We will not see p2p1; we will only see the loopback device.

● We move a network device to the default, initial namespace by:

ip link set p2p1 netns 1

Page 34: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen35/121

● In that namespace, network application which look for files under

/etc, will first look in /etc/netns/myns1/, and then in /etc.

● For example, if we will add the following entry "192.168.2.111

www.dummy.com"

● in /etc/netns/myns1/hosts, and run:

● ping www.dummy.com

● we will see that we are pinging 192.168.2.111.

Page 35: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen36/121

veth

● You can communicate between two network namespaces by:

● creating a pair of network devices (veth) and move one to another

network namespace.

● Veth (Virtual Ethernet) is like a pipe.

● unix sockets (use paths on the filesystems).

Example with veth:

Create two namesapces, myns1 and myns1:

ip netns add myns1

ip netns add myns2

Page 36: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen37/121

vethip netns exec myns1 bash

- open a shell of myns1 net namespace

ip link add name if_one type veth peer name if_one_peer

- create veth interface, with if_one and if_one_peer

- ifconfig running in myns1 will show if_one and if_one_peer

and lo (the loopback device)

- ifconfig running in myns2 will show only lo (the loopback

device)

Run from myns1 shell:

ip link set dev if_one_peer netns myns2

move if_one_peer to myns2

- now ifconfig running in myns2 will show if_one_peer

and lo (the loopback device)

- Now set ip addresses to if_one (myns1) and if_one_peer

(myns2) and you can send traffic.

Page 37: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen38/121

unshare util

● The unshare utility

● Util-linux recent git tree has the unshare utility with support for all six namespaces:

http://git.kernel.org/cgit/utils/util-linux/util-linux.git

./unshare –help

...

Options:

-m, --mount unshare mounts namespace

-u, --uts unshare UTS namespace (hostname etc)

-i, --ipc unshare System V IPC namespace

-n, --net unshare network namespace

-p, --pid unshare pid namespace

-U, --user unshare user namespace

Page 38: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen39/121

● For example:

● Type:

● ./unshare --net bash

– A new network namespace was generated and the bash process was

generated inside that namespace.

● Now run ifconfig -a

● You will see only the loopback device.

– With unshare util, no folder is created under /var/run/netns;

also network application in the net namespace we created, do

not look under /etc/netns

– If you will kill this bash or exit from this bash, then the network

namespace will be freed.

Page 39: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

40

/12

1

This

is n

ot th

e c

ase a

s w

ith ip n

etn

s e

xec m

yns1 b

ash

; in

th

at

case,

killing/e

xitin

g th

e b

ash

does n

ot tr

igger

destr

oyin

g t

he

nam

espace.

For

imple

menta

tion d

eta

ils, lo

ok in

put_

net(

str

uct net

*net)

and th

e r

efe

rence c

ount (n

am

ed “

count”

) of th

e n

etw

ork

nam

esp

ace s

truct n

et.

Page 40: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

41

/12

1

Mount nam

espaces

● A

dd

ed

a m

em

be

r n

am

ed

mn

t_n

s(m

nt_

na

me

sp

ace

ob

ject)

to

th

e n

sp

roxy.

● W

e c

op

y th

e m

ou

nt n

am

esp

ace

of th

e c

allin

g p

roce

ss

usin

g g

en

eric file

syste

m m

eth

od

(se

e c

op

y_

tre

e()

in

du

p_

mn

t_n

s()

).

● In

th

e n

ew

mo

un

t n

am

esp

ace

, a

ll p

revio

us m

ou

nts

will b

e

vis

ible

; a

nd

fro

m n

ow

on

:● m

ou

nts

/un

mo

un

ts in

th

at m

ou

nt n

am

esp

ace

are

in

vis

ible

to

the

re

st o

f th

e s

yste

m.

● m

ou

nts

/un

mo

un

ts in

th

e g

lob

al n

am

esp

ace

are

vis

ible

in

tha

t n

am

esp

ace

.●pam

_nam

espace

mo

dule

uses m

oun

t nam

espace

s (

with

unsha

re(C

LO

NE

_N

EW

NS

) )

(module

s/p

am

_nam

espace/p

am

_na

mespa

ce.c

)

Page 41: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

42

/12

1

mo

un

t nam

espaces: e

xam

ple

1

Exa

mp

le 1

(te

ste

d o

n U

bun

tu):

Ve

rify

th

at /d

ev/s

da

3 is n

ot m

ou

nte

d:

mo

un

t | g

rep

/d

ev/s

da

3

sh

ou

ld g

ive

no

thin

g.

un

sh

are

-m

/b

in/b

ash

mo

un

t /d

ev/s

da

3 /m

nt/sd

a3

no

w r

un

mo

un

t | g

rep

sd

a3

We

will se

e:

/de

v/s

da

3 o

n /m

nt/sd

a3

typ

e e

xt3

(rw

)

rea

dlin

k /p

roc/$

$/n

s/m

nt

mn

t:[4

02

65

32

11

4]

Page 42: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

43

/12

1

Fro

m a

no

the

r te

rmin

al ru

n

read

link /p

roc/$

$/n

s/m

nt

mn

t:[4

02

653

18

40]

Th

e r

esu

lts s

ho

ws tha

t w

e a

re in

a d

iffe

ren

t

na

me

sp

ace

.

No

w r

un

:

mo

un

t | g

rep

sd

a3

/de

v/s

da

3 o

n /m

nt/sda

3 type

ext3

(rw

)

Wh

y ?

We

are

in

a d

iffe

ren

t m

oun

t n

am

espa

ce

?

We

sh

ou

ld h

ave

no

t se

e the

mo

un

t w

hic

h w

as

do

ne

fro

m a

no

ther

nam

espa

ce

!

Page 43: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

44

/12

1

The

an

sw

er

is s

imple

: ru

nn

ing m

ou

nt is

no

t g

ood

en

ou

gh

wh

en

work

ing

with

mo

un

t n

am

esp

ace

s.

Th

e r

ea

so

n is th

at m

ou

nt re

ad

s /e

tc/m

tab

, w

hic

h

wa

s u

pd

ate

d b

y th

e m

ou

nt co

mm

an

d; m

ou

nt

com

ma

nd

do

es n

ot acce

ss th

e k

ern

el str

uctu

res.

Wh

at is

th

e s

olu

tio

n?

Page 44: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

45

/12

1

To

acce

ss d

ire

ctly the

ke

rne

l da

ta s

tru

ctu

res, yo

u

sho

uld

ru

n:

cat /p

roc/m

ou

nts

| g

rep

sd

a3

(/p

roc/m

oun

ts is in

fa

ct sym

bo

lic lin

k to

/pro

c/s

elf/m

ou

nts

).

No

w y

ou

will g

et n

o r

esults, a

s e

xp

ecte

d.

Page 45: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

46

/12

1

mo

un

t nam

espaces: e

xam

ple

2E

xa

mp

le2

: te

ste

d o

n F

ed

ora

18

Ve

rify

th

at /d

ev/s

db

3 is n

ot m

ou

nte

d:

mo

un

t | g

rep

sd

b3

sh

ou

ld g

ive

no

thin

g.

un

sh

are

-m

/b

in/b

ash

mo

un

t /d

ev/s

db

3 /m

nt/sd

b3

no

w r

un

mo

un

t | g

rep

sd

b3

Yo

u w

ill se

e:

/de

v/s

db

3 o

n /m

nt/sd

b3

typ

e e

xt4

(rw

,re

latim

e,d

ata

=o

rde

red

)

rea

dlin

k /p

roc/$

$/n

s/m

nt

mn

t:[4

02

65

32

38

1]

Page 46: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

47

/12

1

Fro

m a

no

the

r te

rmin

al ru

n:

read

link /p

roc/$

$/n

s/m

nt

mn

t:[4

02

653

18

40]

Th

is s

how

s tha

t w

e a

re in a

diffe

rent nam

espace.

No

w r

un:

mo

un

t | g

rep

sd

b3

/dev/s

db3 o

n /m

nt/sd

b3 typ

e e

xt4

(rw

,rela

tim

e,d

ata

=ord

ere

d)

- W

e k

now

now

that

we s

hou

ld u

se c

at /p

roc/m

ounts

(and n

ot

mou

nt)

to g

et th

e r

ight

answ

er

when w

ork

ing w

ith n

am

espace;

so

:

cat /p

roc/m

ou

nts

| g

rep

sd

b3

/dev/s

db3 /

mnt/

sdb3 e

xt4

rw

,rela

tim

e,d

ata

=ord

ere

d 0

0

Why is it

so

? W

e s

hould

ha

ve s

een

no r

esults,

as in p

revio

us

exam

ple

.

Page 47: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

48

/12

1

An

sw

er:

Fed

ora

ru

ns s

ys

tem

d;s

yste

md u

ses th

e s

hare

d fla

g fo

r m

oun

tin

g /.

Fro

m s

yste

md

sou

rce c

ode

: (s

rc/c

ore

/moun

t-setu

p.c

)

int

mount_

setu

p(b

ool lo

aded_policy)

{

.

..

if (m

ount(

NU

LL,

"/",

NU

LL, M

S_R

EC

|MS

_S

HA

RE

D,

NU

LL

) <

0)

log_w

arn

ing("

Failed t

o s

et

up t

he r

oot

directo

ry for

share

d m

ount

pro

pagation: %

m")

;

.

..

} (MS

_R

EC

sta

nd

s fo

r re

curs

ive m

ou

nt)

Ho

w d

o I

kn

ow

whe

the

r w

e h

ave

a s

ha

red

fla

gs ?

ca

t /p

roc/s

elf/m

ou

ntin

fo

| gre

p s

hare

dw

e w

ill se

e:

...

33

1 8

:3 / / r

w,r

ela

tim

e s

hare

d:1

- e

xt4

/de

v/s

da3

rw

,da

ta=

ord

ere

d...

Wh

at

to d

o ?

Page 48: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

49

/12

1

mo

un

t --

ma

ke

-rp

riva

te -

o r

em

ou

nt / /d

ev/s

da

3T

his

ch

an

ges th

e s

ha

red

fla

g to p

riva

te,

recu

rsiv

ely

.

--m

ake

-rp

riva

te –

set th

e p

rivate

fla

g r

ecu

rsiv

ely

Page 49: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

50

/12

1

Share

d s

ubtr

ees

/us

ers

/bin

/

/mn

t

/us

ers

/us

er1

/us

ers

/us

er2

No

w, w

e w

an

t th

at u

se

r1 a

nd u

se

r2 f

old

ers

will see

the

wh

ole

file

syste

m; w

e w

ill ru

n

mo

un

t –b

ind

/

/us

ers

/us

er1

mo

un

t –b

ind

/

/us

ers

/us

er2

By d

efa

ult, th

e file

sysyte

m is m

oun

ted a

s p

riva

te,

un

less t

he s

ha

red

mo

un

t fla

g is s

et e

xp

licitly

.

Page 50: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

51

/12

1

Sh

are

d s

ubtr

ees -

contd

/us

ers

/bin

/

/mn

t

/us

ers

/us

er1

/us

ers

/us

er2

/m

nt

/bin

/mnt

/bin

/use

rs/u

sers

/use

r1/u

se

rs2

/us

er1

/us

er2

Page 51: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

52

/12

1

Sh

are

d s

ub

tre

es –

Qu

iz

Qu

iz:

Now

, w

e m

ount a

usb d

isk o

n k

ey o

n /

mnt/

dok.

Will it b

e s

een

in /u

se

rs/u

se

r1/m

nt

or

/users

/user2

/mnt?

Page 52: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

53

/12

1

Sh

are

d s

ub

tre

es -

co

ntd

Th

e a

nsw

er

is n

o, sin

ce b

y d

efa

ult, th

e file

sysyte

m is

mo

un

ted a

s p

riva

te. To

ena

ble

that th

e d

ok w

ill be s

ee

n

als

o u

nder

/users

/user1

/mnt o

r /u

sers

/user2

/mnt, w

e

sho

uld

mo

unt

the f

ilesyste

m a

s s

hare

d:

mo

un

t / --

make-r

share

dA

nd

th

en m

ount th

e u

sb d

isk o

n k

ey a

gain

.

Th

e s

ha

red

su

btr

ees p

atc

h is f

rom

20

05

by R

am

Pa

i.

It a

dd

so

me

mo

unt

flag

s lik

e –

make

-sla

ve, --

make-r

sla

ve,

-ma

ke

-

un

bin

dab

le, --

make-r

unbin

da

ble

an

d m

ore

. T

he

pa

tch

ad

ded

th

is k

ern

el

mo

unt

fla

gs:

MS

_U

NB

IND

AB

LE

, M

S_P

RIV

AT

E,

MS

_S

LA

VE

and

MS

_S

HA

RE

DT

he s

hare

d f

lag is in u

se b

y t

he f

use f

ilesyste

m.

Page 53: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

54

/12

1

PID

nam

espaces

●A

dd

ed a

mem

ber

nam

ed p

id_

ns

(pid

_n

ames

pac

e o

bje

ct)

to t

he

n

spro

xy.

●P

roce

sses

in

dif

fere

nt

PID

nam

esp

aces

can

hav

e th

e sa

me

pro

cess

ID

.

●W

hen

cre

atin

g t

he

firs

t pro

cess

in

a n

ew n

ames

pac

e, i

ts P

ID i

s 1

.

●B

ehav

ior

like

the

“in

it”

pro

cess

:

–W

hen

a p

roce

ss d

ies,

all

its

orp

han

ed c

hil

dre

n w

ill

no

w h

ave

the

pro

cess

wit

h P

ID 1

as

thei

r p

aren

t (

chil

d r

eap

ing

).

–S

end

ing

SIG

KIL

L s

ignal

do

es n

ot

kil

l p

roce

ss 1

, re

gar

dle

ss o

f w

hic

h n

ames

pac

e th

e co

mm

and w

as i

ssu

ed (

init

ial

nam

esp

ace

or

oth

er p

id n

ames

pac

e).

Page 54: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

55

/12

1

PID

nam

espaces -

contd

●W

hen

a n

ew

na

me

spa

ce

is c

reate

d, w

e c

an

not

se

e fro

m it th

e P

ID

of

the

pa

rent n

am

esp

ace; ru

nn

ing

ge

tpp

id()

fro

m th

e n

ew

pid

na

me

spa

ce

will re

turn

0.

●B

ut a

ll P

IDs w

hic

h a

re u

sed

in

this

na

me

spa

ce

are

vis

ible

to t

he

pa

rent n

am

esp

ace.

●pid

nam

esp

ace

s c

an

be

ne

ste

d, u

p t

o 3

2 n

esting

levels

.

(MA

X_P

ID_N

S_L

EV

EL).

●S

ee:

multi_

pid

ns.c

, M

ich

ael K

err

isk, fr

om

htt

p:/

/lw

n.n

et/A

rtic

les/5

3274

5/.

●W

hen

try

ing to

run

mu

lti_

pid

ns w

ith 3

3, you

will g

et:

–clo

ne: In

valid a

rgum

en

t

Page 55: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

56

/12

1

User

Nam

espaces

● A

dde

d a

me

mb

er

na

me

d u

se

r_n

s

(use

r_n

am

esp

ace o

bje

ct)

to

th

e n

sp

roxy.

●in

clu

de

/lin

ux/u

se

r_n

am

esp

ace

.h

●In

clu

de

s a

po

inte

r n

am

ed

pa

ren

t to

th

e u

se

r_n

am

esp

ace

tha

t cre

ate

d it.

●str

uct u

se

r_n

am

esp

ace

*

pa

ren

t;

●In

clu

de

s th

e e

ffe

ctive

uid

of th

e p

roce

ss th

at cre

ate

d it:

●ku

id_

t o

wn

er;

● A

pro

ce

ss w

ill h

ave

dis

tin

ct se

t o

f U

IDs, G

IDs

an

d c

apa

bilitie

s.

Page 56: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

57

/12

1

User

Nam

espaces

Cre

atin

g a

new

user

nam

espace

is d

one b

y p

assin

g

CL

ON

E_N

EW

US

ER

to

fo

rk()

or

un

sh

are

().

Exa

mp

le:

Ru

nn

ing fro

m s

om

e u

ser

account

id -

u

10

00

// 1

000 is the e

ffective u

ser

ID.

id -

g 10

00

// 1

000 is the e

ffective g

rou

p I

D.

(usu

ally the fir

st u

ser

add

ed g

ets

uid

/gid

of 1000)

Page 57: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

58

/12

1

User

Nam

espaces -

exam

ple

Cap

bil

ties

:ca

t /p

roc/

self

/sta

tus

| gre

p C

apC

apIn

h:

000000000

0000000

Cap

Prm

:000000000

0000000

Cap

Eff

:000000000

0000000

Cap

Bnd:

0000001ff

ffff

fff

In o

rder

to c

reat

e a

use

r nam

espac

e an

d s

tart

a s

hel

l, w

e w

ill

run f

rom

th

at n

on-r

oot

acco

unt:

./n

sexec

-cU

/bin

/bash

●T

he

c fl

ag i

s fo

r usi

ng c

lone

●T

he

U f

lag i

s fo

r usi

ng u

ser

nam

espac

e (C

LO

NE

_N

EW

US

ER

fla

g f

or

clone(

))

Page 58: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

59

/12

1

User

Na

mespaces -

exa

mple

-contd

No

w fro

m th

e n

ew

sh

ell r

un

id -

u 65

53

4

id -

g 65

53

4

● T

hese a

re d

efa

ult v

alu

es for

the

eU

ID a

nd

eG

UID

In

the n

ew

nam

espace.

● W

e w

ill get

the s

am

e r

esults f

or

effe

ctive u

ser

id a

nd e

ffective

root id

als

o w

hen r

unnin

g /

nse

xec

-cU

/bin

/ba

sh a

s ro

ot.

● T

he

def

au

lts

can

be

chan

ged

by:

/pro

c/sy

s/ker

nel

/ove

rflo

wu

id,

/pro

c/sy

s/ker

nel

/ove

rflo

wgid

● I

n f

act

, th

e u

ser

nam

espace

th

at

was

crea

ted h

ad f

ull

capabil

itie

s,

bu

t th

e ca

ll t

o e

xec

() w

ith

bash

rem

ove

d t

hem

.

Page 59: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

60

/12

1

cat /p

roc/s

elf/s

tatu

s | g

rep

C

ap

CapIn

h:

000

000000

0000

000

CapP

rm:

000

000000

0000

000

Cap

Eff

:000

000000

0000

000

CapB

nd:

000

0001fffffffff

Page 60: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

61

/12

1

Use

r N

am

espaces -

contd

No

w r

un

:

ech

o $

$ (

ge

t th

e b

ash

pid

)

No

w, fr

om

a d

iffe

rent

roo

t te

rmin

al, w

e s

et th

e u

id_m

ap:

Fir

st,

we c

an s

ee that uid

_m

ap is u

nin

itia

lized b

y:

ca

t /p

roc/<

pid

>/u

id_m

ap

Th

en

:

ech

o 0

10

00

10

> /p

roc/<

pid

>/u

id_m

ap

(<p

id>

is th

e p

id o

f th

e b

ash

pro

ce

ss fro

m p

revio

us s

tep

).

En

try in

uid

_m

ap

is o

f th

e fo

llo

win

g fo

rma

t:

na

me

sp

ace

_firs

t_u

id h

ost_

firs

t_u

id n

um

be

r_o

f_u

ids

So

th

is s

ets

th

e first u

id in

th

e n

ew

na

me

sp

ace

(w

hic

h

co

rre

sp

on

d to

uid

10

00

in

th

e o

uts

ide

wo

rld

) to

be

0; th

e

se

co

nd

will b

e 1

; a

nd

so

fo

rth

, fo

r 1

0 e

ntr

ies.

Page 61: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

62

/12

1

Use

r N

am

espaces -

contd

No

te: yo

u c

an

se

t th

e u

id_

ma

p o

nly

on

ce

fo

r a

sp

ecific

pro

ce

ss. F

urt

he

r a

tte

mp

ts w

ill fa

il.

run

id -

u Yo

u w

ill g

et 0

.

wh

oa

mi

roo

t

●U

se

r n

am

esp

ace

is th

e o

nly

na

me

sp

ace

wh

ich

ca

n b

e

cre

ate

d w

ith

ou

t C

AP

_S

YS

_A

DM

IN c

ap

ab

ility

Page 62: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

63

/12

1

ca

t /p

roc/s

elf/s

tatu

s | g

rep

Ca

p

Ca

pIn

h:

00

000

00

000

00

000

0

Ca

pP

rm:

00

000

01

fffffffff

Ca

pE

ff:

00

000

01

fffffffff

Ca

pB

nd:

00

000

01

fffffffff

Th

e C

apE

ff (

Effective

Capa

bilites)

is 1

fffffffff-

> t

his

is 3

7 b

its o

f '1

' ,

wh

ich

me

ans a

ll c

apa

bilitie

s.

Qu

iz:

Will u

nsh

are

--n

et b

ash w

ork

no

w ?

Page 63: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

64

/12

1

An

sw

er:

no

unsha

re -

-net ba

sh

unsha

re:

canno

t set g

roup

id: In

valid

arg

um

ent

Bu

t aft

er

runnin

g, fr

om

a d

iffe

ren

t te

rmin

al, a

s r

oot:

echo 0

100

0 1

0 >

/p

roc/2

429/g

id_m

ap

It w

ill w

ork

.

ls

/root

will fa

il h

ow

ever:

ls /

root/

ls:

cann

ot o

pen d

irecto

ry /ro

ot/:

Perm

issio

n d

enie

d

Page 64: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

65

/12

1

Sho

rt q

uiz

1:

I a

m a

re

gula

r u

se

r, n

ot ro

ot.

Will clo

ne

() w

ith

(C

LO

NE

_N

EW

NE

T)

wo

rk ?

Sh

ort

qu

iz 2

:

Will c

lon

e()

with

(C

LO

NE

_N

EW

NE

T | C

LO

NE

_N

EW

US

ER

)

wo

rk ?

Page 65: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

66

/12

1

●Q

uiz

1 : N

o.

● In o

rde

r to

use t

he C

LO

NE

_N

EW

NE

T w

e n

eed

to h

ave

CA

P_

SY

S_A

DM

IN.

un

sha

re -

-ne

t b

ash

un

sha

re: un

sha

re f

aile

d: O

pera

tion

no

t p

erm

itte

d

●Q

uiz

2: Y

es.

na

mespa

ces c

ode

gu

ara

nte

es u

s tha

t u

ser

na

mespace c

reatio

n is th

e

firs

t to

be

cre

ate

d.

Fo

r cre

ating

a u

se

r n

am

espa

ce

we

do

'nt n

ee

d

CA

P_

SY

S_A

DM

IN. T

he

user

nam

esp

ace is c

reate

d w

ith

fu

ll

ca

pab

ilitie

s, so

we

ca

n c

rea

te th

e n

etw

ork

nam

espace s

ucce

ssfu

lly.

./u

nsh

are

--

net

--u

se

r /b

in/b

ash

N

o e

rro

rs!

Page 66: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

67

/12

1

Quiz

3:

If y

ou r

un, fr

om

a n

on

roo

t user,

unsare

–user

bash

An

d the

n

cat /p

roc/s

elf/s

tatu

s | g

rep

CapE

ffC

apE

ff:

000

000000

0000

000

This

mean

s n

o c

ap

abilitie

s. S

o h

ow

was the n

et n

am

esp

ace,

whic

h n

ee

ds C

AP

_S

YS

_A

DM

IN, cre

ate

d ?

Page 67: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

68

/12

1

An

sw

er:

we f

irst

do u

nshare

;

It is f

irst do

ne w

ith u

ser

nam

espace. T

his

en

able

s a

ll c

ap

abilitie

s.

The

n w

e c

reate

the n

am

espace. A

fterw

ard

s,

we c

all e

xec f

or

the

shell; e

xe

c r

em

oves c

apabilitie

s.

Fro

m u

nshare

.c o

f util-linux:

if (

-1 =

= u

nshare

(unshare

_flags))

err

(EX

IT_FA

ILU

RE

, _

("un

sh

are

faile

d")

);..

.

exec_shell(

);

Page 68: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

69

/12

1

An

ato

my

of

a u

se

r n

am

es

pa

ce

s v

uln

era

bilit

yB

y M

ich

ael K

err

isk, M

arc

h 2

01

3

Abo

ut C

VE

20

13

-18

58

- e

xp

loita

ble

se

cu

rity

vuln

era

bility

http

://lw

n.n

et/A

rtic

les/5

43

27

3/

Page 69: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen70/121

cgroups

● cgroups (control groups) subsystem is a Resource Management solution providing a

generic process-grouping framework.

● This work was started by engineers at Google (primarily Paul Menage and Rohit Seth) in

2006 under the name "process containers; in 2007, renamed to “Control Groups”.

● Maintainers: Li Zefan (huawei) and Tejun Heo ;

● The memory controller (memcg) is maintained separately (4 maintainers)

● Probably the most complex.

– Namespaces provide per process resource isolation solution.

– Cgroups provide resource management solution (handling groups).

● Available in Fedora 18 kernel and ubuntu 12.10 kernel (also some previous releases).

– Fedora systemd uses cgroups.

– Ubuntu does not have systemd. Tip: do tests with Ubuntu and also make sure that cgroups are not mounted after boot, by looking with mount (packages such as cgroup-lite can exist)

Page 70: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen71/121

● The implementation of cgroups requires a few, simple hooks into the rest

of the kernel, none in performance-critical paths:

– In boot phase (init/main.c) to preform various initializations.

– In process creation and destroy methods, fork() and exit().

– A new file system of type "cgroup" (VFS)

– Process descriptor additions (struct task_struct)

– Add procfs entries:

● For each process: /proc/pid/cgroup.

● System-wide: /proc/cgroups

Page 71: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen72/121

– The cgroup modules are not located in one folder but

scattered in the kernel tree according to their functionality:

● memory: mm/memcontrol.c

● cpuset: kernel/cpuset.c.

● net_prio: net/core/netprio_cgroup.c

● devices: security/device_cgroup.c.

● And so on.

Page 72: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen73/121

cgroups and kernel namespaces

Note that the cgroups is not dependent upon namespaces; you can build cgroups without namespaces kernel support.

There was an attempt in the past to add "ns" subsystem (ns_cgroup, namespace cgroup subsystem); with this, you could mount a namespace subsystem by:

mount -t cgroup -ons.

This code it was removed in 2011 (by a patch by Daniel Lezcano).

See:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a77aea92010acf54ad785047234418d5d68772e2

Page 73: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen74/121

cgroups VFS

● Cgroups uses a Virtual File System

– All entries created in it are not persistent and deleted after

reboot.

● All cgroups actions are performed via filesystem actions

(create/remove directory, reading/writing to files in it,

mounting/mount options).

● For example:

– cgroup inode_operations for cgroup mkdir/rmdir.

– cgroup file_system_type for cgroup mount/unmount.

– cgroup file_operations for reading/writing to control files.

Page 74: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen75/121

Mounting cgroups

In order to use a filesystem (browse it/attach tasks to cgroups,etc) it must be mounted.

The control group can be mounted anywhere on the filesystem. Systemd uses /sys/fs/cgroup.

When mounting, we can specify with mount options (-o) which subsystems we want to use.

There are 11 cgroup subsystems (controllers) (kernel 3.9.0-rc4 , April 2013); two can be built as modules. (All subsystems are instances of cgroup_subsys struct)

cpuset_subsys - defined in kernel/cpuset.c.

freezer_subsys - defined in kernel/cgroup_freezer.c.

mem_cgroup_subsys - defined in mm/memcontrol.c; Aka memcg - memory control groups.

blkio_subsys - defined in block/blk-cgroup.c.

net_cls_subsys - defined in net/sched/cls_cgroup.c ( can be built as a kernel module)

net_prio_subsys - defined in net/core/netprio_cgroup.c ( can be built as a kernel module)

devices_subsys - defined in security/device_cgroup.c.

perf_subsys (perf_event) - defined in kernel/events/core.c

hugetlb_subsys - defined in mm/hugetlb_cgroup.c.

cpu_cgroup_subsys - defined in kernel/sched/core.c

cpuacct_subsys - defined in kernel/sched/core.c

Page 75: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen76/121

Mounting cgroups – contd.

In order to mount a subsystem, you should first create a folder for it

under /cgroup.

In order to mount a cgroup, you first mount some tmpfs root folder:

● mount -t tmpfs tmpfs /cgroup

Mounting of the memory subsystem, for example, is done thus:

● mkdir /cgroup/memtest

● mount -t cgroup -o memory test /cgroup/memtest/

Note that instead “test” you can insert any text; this text is not

handled by cgroups core. It's only usage is when displaying the mount

by the “mount” command or by cat /proc/mounts.

Page 76: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen77/121

Mounting cgroups – contd.

● Mount creates cgroupfs_root object + cgroup (top_cgroup) object

● mounting another path with the same subsystems - the same

subsys_mask; the same cgroupfs_root object is reused.

● mkdir increments number_of_cgroups, rmdir decrements number_of_cgroups.

● cgroup1 - created by mkdir /cgroup/memtest/cgroup1.

struct super_block *sbThe super block being used. (in memory).

struct cgroup top_cgroup

unsigned long subsys_mask bitmask of subsystems attached to this hierarchyint number_of_cgroups

cgroupfs_root

cgroup

cgroup1 cgroup2

parent parent

parent

parent

cgroup3

cgroupfs_root *root

Page 77: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen78/121

Mounting a set of subsystems

From Documentation/cgroups/cgroups.txt:

If an active hierarchy with exactly the same set of subsystems

already exists, it will be reused for the new mount.

If no existing hierarchy matches, and any of the requested

subsystems are in use in an existing hierarchy, the mount will fail

with -EBUSY.

Otherwise, a new hierarchy is activated, associated with the

requested subsystems.

Page 78: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen79/121

First case: Reuse

● mount -t tmpfs test1 /cgroup/test1

● mount -t tmpfs test2 /cgroup/test2

● mount -t cgroup -ocpu,cpuacct test1 /cgroup/test1

● mount -t cgroup -ocpu,cpuacct test2 /cgroup/test2

● This will work; the mount method recognizes that we want to

use the same mask of subsytems in the second case.

– (Behind the scenes, this is done by the return value of sget() method, called

from cgroup_mount(), found an already allocated superblock; the sget()

makes sure that the mask of the sb and the required mask are identical)

– Both will use the same cgroupfs_root object.

● This is exactly the first case described in Documentation/cgroups/cgroups.txt

Page 79: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen80/121

Second case: any of the requested

subsystems are in use

● mount -t tmpfs tmpfs /cgroup/tst1/

● mount -t tmpfs tmpfs /cgroup/tst2/

● mount -t tmpfs tmpfs /cgroup/tst3/

● mount -t cgroup -o freezer tst1 /cgroup/tst1/

● mount -t cgroup -o memory tst2 /cgroup/tst2/

● mount -t cgroup -o freezer,memory tst3 /cgroup/tst3

– Last command will give an error. (-EBUSY).

The reason: these subsystems (controllers) were been

separately mounted.

● This is exactly the second case described in Documentation/cgroups/cgroups.txt

Page 80: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen81/121

Third case - no existing hierarchy

no existing hierarchy matches, and none of the requested

subsystems are in use in an existing hierarchy:

mount -t cgroup -o net_prio netpriotest /cgroup/net_prio/

Will succeed.

Page 81: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen82/121

– under each new cgroup which is created, these 4 files are always created:

● tasks

– list of pids which are attached to this group.

● cgroup.procs.

– list of thread group IDs (listed by TGID) attached to this group.

● cgroup.event_control.

– Example in following slides.

● notify_on_release (boolean).

– For a newly generated cgroup, the value of notify_on_release in inherited

from its parent; However, changing notify_on_release in the parent does not

change the value in the children he already has.

– Example in following slides.

– For the topmost cgroup root object only, there is also a release_agent – a

command which will be invoked when the last process of a cgroup terminates; the

notify_on_release flag should be set in order that it will be activated.

Page 82: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen83/121

● Each subsystem adds specific control files for its own needs, besides

these 4 fields. All control files created by cgroup subsystems are given a

prefix corresponding to their subsystem name. For example:

cpuset.cpus

cpuset.mems

cpuset.cpu_exclusive

cpuset.mem_exclusive

cpuset.mem_hardwall

cpuset.sched_load_balance

cpuset.sched_relax_domain_level

cpuset.memory_migrate

cpuset.memory_pressure

cpuset.memory_spread_page

cpuset.memory_spread_slab

cpuset.memory_pressure_enabled

cpusetsubsystem

devices.allow

devices.deny

devices.list

devices subsystem

Page 83: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen84/121

cpu subsystem

cpu.shares (only if CONFIG_FAIR_GROUP_SCHED is set)

cpu.cfs_quota_us (only if CONFIG_CFS_BANDWIDTH is set)

cpu.cfs_period_us (only if CONFIG_CFS_BANDWIDTH is set)

cpu.stat (only if CONFIG_CFS_BANDWIDTH is set)

cpu.rt_runtime_us (only if CONFIG_RT_GROUP_SCHED is set)

cpu.rt_period_us (only if CONFIG_RT_GROUP_SCHED is set)

cpu subsystem

Page 84: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen85/121

memory subsystemmemory.usage_in_bytes

memory.max_usage_in_bytes

memory.limit_in_bytes

memory.soft_limit_in_bytes

memory.failcnt

memory.stat

memory.force_empty

memory.use_hierarchy

memory.swappiness

memory.move_charge_at_immigrate

memory.oom_control

memory.numa_stat (only if CONFIG_NUMA is set)

memory.kmem.limit_in_bytes (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.usage_in_bytes (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.failcnt (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.max_usage_in_bytes (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.tcp.limit_in_bytes (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.tcp.usage_in_bytes (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.tcp.failcnt (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.tcp.max_usage_in_bytes (only if CONFIG_MEMCG_KMEM is set)

memory.kmem.slabinfo (only if CONFIG_SLABINFO is set)

memory.memsw.usage_in_bytes (only if CONFIG_MEMCG_SWAP is set)

memory.memsw.max_usage_in_bytes (only if CONFIG_MEMCG_SWAP is set)

memory.memsw.limit_in_bytes (only if CONFIG_MEMCG_SWAP is set)

memory.memsw.failcnt (only if CONFIG_MEMCG_SWAP is set)

memory

subsystem

up to 25 control files

Page 85: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen86/121

blkio subsystemblkio.weight_device

blkio.weight

blkio.weight_device

blkio.weight

blkio.leaf_weight_device

blkio.leaf_weight

blkio.time

blkio.sectors

blkio.io_service_bytes

blkio.io_serviced

blkio.io_service_time

blkio.io_wait_time

blkio.io_merged

blkio.io_queued

blkio.time_recursive

blkio.sectors_recursive

blkio.io_service_bytes_recursive

blkio.io_serviced_recursive

blkio.io_service_time_recursive

blkio.io_wait_time_recursive

blkio.io_merged_recursive

blkio.io_queued_recursive

blkio.avg_queue_size (only ifCONFIG_DEBUG_BLK_CGROUP is set)

blkio.group_wait_time (only ifCONFIG_DEBUG_BLK_CGROUP is set)

blkio.idle_time (only ifCONFIG_DEBUG_BLK_CGROUP is set)

blkio.empty_time (only ifCONFIG_DEBUG_BLK_CGROUP is set)

blkio.dequeue (only ifCONFIG_DEBUG_BLK_CGROUP is set)

blkio.unaccounted_time (only ifCONFIG_DEBUG_BLK_CGROUP is set)

blkio.throttle.read_bps_device

blkio.throttle.write_bps_device

blkio.throttle.read_iops_device

blkio.throttle.write_iops_device

blkio.throttle.io_service_bytes

blkio.throttle.io_serviced

Page 86: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen87/121

netprio

net_prio.ifpriomap

net_prio.prioidx

Note the netprio_cgroup.ko should be insmoded

so the mount will succeed. Moreover, rmmod will

fail if netprio is mounted

Page 87: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen88/121

– When mounting a cgroup subsystem (or a set of cgroup subsystems) , allall processes in the system belong to it (the top cgroup object).

● After mount -t cgroup -o memory test /cgroup/memtest/

– you can see all tasks by: cat /cgroup/memtest/tasks

– When creating new child cgroups in that hierarchy, each one of them will not have

any tasks at all initially.

– Example:

– mkdir /cgroup/memtest/group1

– mkdir /cgroup/memtest/group2

– cat /cgroup/memtest/group1/tasks

● Shows nothing.

– cat /cgroup/memtest/group2/tasks

● Shows nothing.

Page 88: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen89/121

●Any task can be a member of exactly one cgroup in a specific

hierarchy.●Example:●echo $$ > /cgroup/memtest/group1/tasks ●cat /cgroup/memtest/group1/tasks ●cat /cgroup/memtest/group2/tasks ●Will show that task only in group1/tasks.●After: ●echo $$ > /cgroup/memtest/group2/tasks ●The task was moved to group2; we will see that task it only in

group2/tasks.

Page 89: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen90/121

Removing a child groupRemoving a child group is done by rmdir.

We cannot remove a child group in these two cases:●When it has processes attached to it.●When it has children.

We will get -EBUSY error in both cases.

Example 1 - processes attached to a group:echo $$ > /cgroup/memtest/group1/tasks rmdir /cgroup/memtest/group1rmdir: failed to remove `/cgroup/memtest/group1': Device or

resource busy

Example 2 - group has children:mkdir /cgroup/memtest/group2/childOfGroup2cat /cgroup/memtest/group2/tasks

- to make sure that there are no processes in group2.

rmdir /cgroup/memtest/group2/rmdir: failed to remove `/cgroup/memtest/group2/': Device or resource busy

Page 90: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen91/121

● Nesting is allowed:

– mkdir /cgroup/memtest/0/FirstSon

– mkdir /cgroup/memtest/0/SecondSon

– mkdir /cgroup/memtest/0/ThirdSon

● However, there are subsystems which will emit a kernel warning when trying to nest; in this subsystems, the .broken_hierarchy boolean member of cgroup_subsys is set explicitly to true.

For example:

struct cgroup_subsys devices_subsys = {

.name = "devices",

...

.broken_hierarchy = true,

}

BTW, a recent patch removed it; in latest git for-3.10 tree, the only subsystem with broken_hierarchy is blkio.

Page 91: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen92/121

broken_hierarchy example

● typing:

● mkdir /sys/fs/cgroup/devices/0

● Will omit no error, but if afterwards we will type:

● mkdir /sys/fs/cgroup/devices/0/firstSon

● We will see in the kernel log this warning:

● cgroup: mkdir (4730) created nested cgroup for controller "devices"

which has incomplete hierarchy support. Nested cgroups may

change behavior in the future.

Page 92: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen93/121

● In this way, we can mount any one of the 11 cgroup subsystems

(controllers) under it:

● mkdir /cgroup/cpuset

● mount -t cgroup -ocpuset cpuset_group /cgroup/cpuset/

● Also here, the “cpuset_group” is only for the mount command,

– So this will also work:

– mkdir /cgroup2/

– mount -t tmpfs cgroup2_root /cgroup2

– mkdir /cgroup2/cpuset

– mount -t cgroup -ocpuset mytest /cgroup2/cpuset

Page 93: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen94/121

devices

● Also referred to as : devcg (devices control group)

● devices cgroup provides enforcing restrictions on opening and mknod operations

on device files.

● 3 files: devices.allow, devices.deny, devices.list.

– devices.allow can be considered as devices whitelist

– devices.deny can be considered as devices blacklist.

– devices.list available devices.

● Each entry is 4 fields:

– type: can be a (all), c (char device), or b (block device).

● All means all types of devices, and all major and minor numbers.

– Major number.

– Minor number.

– Access: composition of 'r' (read), 'w' (write) and 'm' (mknod).

Page 94: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen95/121

devices - example

/dev/null major number is 1 and minor number is 3 (You can fetch the major/minor number from

Documentation/devices.txt)

mkdir /sys/fs/cgroup/devices/0

By default, for a new group, you have full permissions:

cat /sys/fs/cgroup/devices/0/devices.list

a *:* rwm

echo 'c 1:3 rmw' > /sys/fs/cgroup/devices/0/devices.deny

This denies rmw access from /dev/null deice.

echo $$ > /sys/fs/cgroup/devices/0/tasks

echo "test" > /dev/null

bash: /dev/null: Operation not permitted

echo a > /sys/fs/cgroup/devices/0/devices.allow

This adds the 'a *:* rwm' entry to the whitelist.

echo "test" > /dev/null

Now there is no error.

Page 95: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen96/121

cpuset

● Creating a cpuset group is done with:

– mkdir /sys/fs/cgroup/cpuset/0

● You must be root to run this; for non root user, you will get

the following error:

– mkdir: cannot create directory ‘/sys/fs/cgroup/cpuset/0’:

Permission denied

● cpusets provide a mechanism for assigning a set of CPUs and

Memory Nodes to a set of tasks.

Page 96: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen97/121

cpuset example

On Fedora 18, cpuset is mounted after boot on /sys/fs/cgroup/cpuset.

cd /sys/fs/cgroup/cpuset

mkdir test

cd test

/bin/echo 1 > cpuset.cpus

/bin/echo 0 > cpuset.mems

cpuset.cpus and cpuset.mems are not initialized; these two initializations are

mandatory.

/bin/echo $$ > tasks

Last command moves the shell process to the new cpuset cgroup.

You cannot move a list of pids in a single command; you mush issue a separate

command for each pid.

Page 97: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen98/121

memcg (memory control groups)

Example:

mkdir /sys/fs/cgroup/memory/0

echo $$ > /sys/fs/cgroup/memory/0/tasks

echo 10M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes

You can disable the out of memory killer with memcg:

echo 1 > /sys/fs/cgroup/memory/0/memory.oom_control

This disables the oom killer.

cat /sys/fs/cgroup/memory/0/memory.oom_control

oom_kill_disable 1

under_oom 0

Page 98: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen99/121

● Now run some memory hogging process in this cgroup, which is

known to be killed with oom killer in the default namespace.

● This process will not be killed.

● After some time, the value of under_oom will change to 1

● After enabling the OOM killer again:

echo 0 > /sys/fs/cgroup/memory/0/memory.oom_control

You will get soon get the OOM “Killed” message.

Page 99: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

http://ramirose.wix.com/ramirosen100/121

Notification API

● There is an API which enable us to get notifications about changing

status of a cgroup. It uses the eventfd() system call

● See man 2 eventfd

● It uses the fd of cgroup.event_control

● Following is a simple userspace app , “eventfd” (error handling was

omitted for brevity)

Page 100: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

1/1

21

No

tification A

PI – e

xam

ple

ch

ar

bu

f[2

56

];

int

eve

nt_

fd,

co

ntr

ol_

fd, o

om

_fd

, w

b;

uin

t64

_t u

;

eve

nt_

fd =

eve

ntf

d(0

, 0

);

co

ntr

ol_

fd =

op

en

("cg

rou

p.e

ve

nt_

co

ntr

ol"

, O

_W

RO

NLY

);

oo

m_

fd =

op

en

("m

em

ory

.oo

m_

co

ntr

ol"

, O

_R

DO

NLY

);

sn

pri

ntf(b

uf, 2

56

, "%

d %

d",

eve

nt_

fd,

oo

m_

fd);

wri

te(c

on

tro

l_fd

, b

uf, w

b);

clo

se

(co

ntr

ol_

fd);

for

(;;)

{

re

ad

(eve

nt_

fd,

&u

, siz

eo

f(u

int6

4_

t));

p

rin

tf("

oo

m e

ve

nt re

ce

ive

d fro

m m

em

_cg

rou

p\n

");

}

Page 101: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

2/1

21

Notifica

tion A

PI – e

xam

ple

(contd

)

●N

ow

run t

his

pro

gra

m (

eventf

d)

thu

s:

●F

rom

/s

ys/fs/c

gro

up/m

em

ory

/0

./e

ven

tfd

cgro

up.e

ve

nt_

con

tro

l m

em

ory

.oom

_contr

ol

Fro

m a

second

term

inal ru

n:

cd

/s

ys/f

s/c

gro

up/m

em

ory

/0/

ech

o $

$

>

/s

ys/f

s/c

gro

up/m

em

ory

/0/t

asks

ech

o 1

0M

>

/sys/f

s/c

gro

up

/mem

ory

/0/m

em

ory

.lim

it_in

_byte

s

Th

en r

un a

mem

ory

hog p

roble

m.

Whe

n o

n O

OM

kille

r is

invoked,

you

will get

the m

essages f

rom

eventf

d u

sers

pace p

rogra

m,

“o

om

even

t re

ce

ived

fro

m m

em

_cg

rou

p”.

Page 102: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

3/1

21

rele

ase

_ag

en

t e

xa

mp

le

●T

he r

ele

ase_a

ge

nt is

invoke

d w

he

n th

e la

st pro

ce

ss o

f a c

gro

up t

erm

inate

s.

●T

he c

gro

up s

ysfs

notify

_o

n_

rele

ase e

ntr

y s

hou

ld b

e s

et so t

ha

t re

lea

se_a

ge

nt w

ill be in

voke

d.

●A

sh

ort

scrip

t, /

wo

rk/d

ev/t/d

ate

.sh

:

#!/

bin

/sh

da

te >

> /

work

/log.t

xt

Run a

sim

ple

pro

ce

ss, w

hic

h s

imply

sle

eps f

ore

ver;

le

t's s

ay it's P

ID is p

idS

leep

ingP

rocess.

ech

o 1

> /

sys/fs/c

gro

up

/me

mory

/no

tify

_o

n_

rele

ase

ech

o /w

ork

/dev/t/d

ate

.sh

> /sys/f

s/c

gro

up/m

em

ory

/re

lease

_ag

en

t

mkd

ir

/sys/fs/c

gro

up/m

em

ory

/0/

ech

o p

idS

lee

pin

gP

rocess >

/sys/fs/c

gro

up

/me

mory

/0/tasks

kill

-9

pid

Sle

ep

ingP

roce

ss

This

activa

tes t

he

re

lease

_a

gen

t; s

o w

e w

ill se

e t

hat

the

curr

ent

tim

e a

nd d

ate

was w

ritte

n t

o

/work

/log.t

xt

Page 103: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

4/1

21

Syste

md a

nd c

gro

ups

● S

yste

md

– d

eve

lop

ed

by L

en

na

rt P

oe

tte

rin

g, K

ay S

ieve

rs,

oth

ers

. ● R

ep

lace

me

nt fo

r th

e L

inu

x in

it s

crip

ts a

nd

da

em

on

.

Ad

op

ted

by F

ed

ora

(sin

ce

Fe

do

ra 1

5 )

, o

pe

nS

US

E , o

the

rs.

● U

de

v w

as in

teg

rate

d in

to s

yste

md

.

● s

yste

md u

se

s c

on

trol gro

up

s o

nly

for

pro

cess g

roupin

g;

no

t fo

r an

yth

ing

els

e lik

e a

llocatin

g r

eso

urc

es lik

e b

lock io b

and

wid

th,

etc

.

rele

as

e_

ag

en

t is

a m

ount

op

tio

n o

n F

edo

ra 1

8:

mo

unt

-a | g

rep s

yste

md

cgro

up o

n /

sys/fs/c

gro

up/s

yste

md t

ype c

gro

up

(rw

,nosuid

,nodev,n

oexec,r

ela

tim

e,r

ele

ase_ag

en

t=/u

sr/

lib

/syste

md

/syste

md

-cg

rou

ps-a

gen

t,nam

e=

syste

md)

Page 104: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

5/1

21

cg

rou

p-a

gen

t is

a s

hort

pro

gra

m (

cg

roups-a

gent.c)

wh

ich

all it

doe

s is s

end d

bus m

essage v

ia the D

BU

S

api.

dbu

s_

message_n

ew

_sig

nal()/

dbus_m

essage_append_

arg

s()

/dbus_connection_send()

syste

md L

ightw

eig

ht C

onta

iners

new

fe

atu

re in F

edora

19:

http

s://f

edora

pro

ject.org

/wik

i/F

eatu

res/S

yste

mdLig

htw

ei

ghtC

on

tain

ers

Page 105: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

6/1

21

ls /sys/fs/c

gro

up

/syste

md/s

yste

m

abrt

d.s

erv

ice

cro

nd

.serv

ice

rpcb

ind

.serv

ice

abrt

-oo

ps.s

erv

ice

cup

s.s

erv

ice

rsyslo

g.s

erv

ice

abrt

-xo

rg.s

erv

ice

dbu

s.s

erv

ice

sen

dm

ail.s

erv

ice

acco

un

ts-d

ae

mon

.serv

ice fir

ew

alld

.se

rvic

e

sm

art

d.s

erv

ice

atd

.se

rvic

e

getty@

.se

rvic

e

sm

-clie

nt.serv

ice

aud

itd

.serv

ice

iprd

um

p.s

erv

ice

ssh

d.s

erv

ice

blu

eto

oth

.se

rvic

e

iprin

it.s

erv

ice

syste

md-f

sck@

.se

rvic

e

cgro

up.c

lon

e_ch

ild

ren

ipru

pd

ate

.se

rvic

e

syste

md-j

ourn

ald

.serv

ice

cgro

up.e

ven

t_con

tro

l

ksm

tun

ed

.serv

ice

syste

md-l

ogin

d.s

erv

ice

cgro

up.p

rocs

mce

log

.serv

ice

syste

md-u

de

vd

.serv

ice

colo

rd.s

erv

ice

Ne

two

rkM

an

ag

er.serv

ice

ta

sks

con

fig

ure

-pri

nte

[email protected]

erv

ice

n

otify

_o

n_

rele

ase

udis

ks2

.se

rvic

e

con

sole

-kit-d

ae

mon

.se

rvic

e p

olk

it.s

erv

ice

upo

wer.serv

ice

We h

ave

here

34

serv

ice

s.

Page 106: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

7/1

21

Exam

ple

fo

r b

lueto

oth

syste

md e

ntr

y:

ls /

sys/f

s/c

gro

up/s

yste

md/s

yste

m/b

lueto

oth

.serv

ice/

cgro

up.c

lone_childre

n

cgro

up.e

vent_

contr

ol c

gro

up.p

rocs

notify

_on_re

lease

tasks

cat

/sys/fs/c

gro

up/s

yste

md/s

yste

m/b

lueto

oth

.serv

ice/t

asks

70

9

Th

ere

are

serv

ices w

hic

h h

ave m

ore

than o

ne p

id in t

he t

asks c

ontr

ol file

.

Page 107: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

8/1

21

●W

ith fe

do

ra 1

8, d

efa

ult lo

cation o

f cgro

up

mo

un

t is

: /s

ys/f

s/c

gro

up

●W

e h

ave 9

co

ntr

oll

ers

:●/s

ys/f

s/c

gro

up

/blk

io●/s

ys/f

s/c

gro

up

/cp

u,c

pu

acct

●/s

ys/f

s/c

gro

up

/cp

uset

●/s

ys/f

s/c

gro

up

/devic

es

●/s

ys/f

s/c

gro

up

/fre

ezer

●/s

ys/f

s/c

gro

up

/mem

ory

●/s

ys/f

s/c

gro

up

/net_

cls

●/s

ys/f

s/c

gro

up

/perf

_eve

nt

●/s

ys/f

s/c

gro

up

/syste

md

●In

bo

ot,

syste

md

pars

es /s

ys/f

s/c

gro

up

an

d m

ou

nts

all e

ntr

ies.

Page 108: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

10

9/1

21

/p

roc/

cgro

up

s

In F

edora

18

, c

at

/pro

c/cg

rou

ps

giv

es:

#su

bsy

s_n

am

eh

iera

rch

yn

um

_cg

rou

ps

ena

ble

d

cpu

set

21

1

cpu

33

71

cpu

acct

3

37

1

mem

ory

41

1

dev

ices

51

1

free

zer

61

1

net

_cl

s7

11

blk

io8

11

per

f_ev

ent

91

1

Page 109: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

110

/12

1

Lib

cgro

up

Lib

cg

rou

p

libcg

roup

is a

lib

rary

that abstr

acts

the c

ontr

ol gro

up file s

yste

m in L

inux.

lib

cg

rou

p-t

oo

ls p

ackag

e p

rovid

es

to

ols

fo

r p

erf

orm

ing

cg

rou

ps a

cti

on

s.

U

buntu

:apt-

get

insta

ll c

gro

up-b

in (

trie

d o

n U

buntu

12.1

0)

F

edora

: yum

insta

ll lib

cgro

up

cg

cre

ate

cre

ate

s n

ew

cgro

up;

cg

set

sets

para

mete

rs f

or

giv

en c

gro

up(s

); a

nd c

gexec

runs a

task in s

pecifie

d

co

ntr

ol gro

ups.

Exa

mp

le:

cg

cre

ate

-g

cp

uset:

/test

cg

set

-r c

pu

set.

cp

us=

1 /

test

cg

set

-r c

pu

set.

mem

s=

0 /

test

cg

exec -

g c

pu

se

t:/t

est

bash

Page 110: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

111

/12

1

One

of th

e a

dva

nta

ge

s o

f cgro

up

s fra

me

wo

rk is

tha

t it is s

imp

le to a

dd

ke

rne

l m

od

ule

s w

hic

h w

ill

wo

rk w

ith

. T

here

are

on

ly tw

o c

allb

ack w

hic

h w

e

mu

st im

ple

me

nt, c

ss

_allo

c()

and

cs

s_

fre

e()

.

And

the

re is n

o n

ee

d to

patc

h th

e k

ern

el u

nle

ss

you

do

so

meth

ing

sp

ecia

l.

Th

us, n

et/core

/ne

tprio

_cg

rou

p.c

is o

nly

322

lin

es

of co

de

an

d n

et/sch

ed

/cls

_cg

rou

p.c

is 3

32

lin

es

of co

de

.

Page 111: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

112

/12

1

Ch

eck

po

int/

Res

tart

Ch

eck

poin

tin

g is

to

the

op

erat

ion o

f a

Ch

eck

po

inti

ng

th

e st

ate

of

a g

rou

p o

f p

roce

sses

to

a si

ng

le f

ile

or

sev

eral

fil

es.

Res

tart

is

the

op

erat

ion o

f re

stori

ng

th

ese

pro

cess

es a

t so

me

futu

re t

ime

by

rea

din

g a

nd

p

arsi

ng

th

at f

ile/

file

s.

Att

em

pts

to m

erg

e C

heckpoin

t/R

esta

rt in t

he L

inux k

ern

el fa

iled:

Att

em

pts

to m

erg

e C

KPT o

f openV

Z f

ailed:

Ore

n L

aadan s

pent

about

thre

e y

ears

for

imple

menti

ng

checkpoin

t/re

sta

rt in k

ern

el; th

is c

ode w

as n

ot

merg

ed e

ither.

Checkpoin

t and R

esto

re In U

sers

pace (

CR

IU)

●A

pro

ject

of

OpenV

Z

●sponsore

d a

nd s

upport

ed b

y P

ara

llels

.

Uses s

om

e k

ern

el patc

hes

htt

p:/

/cri

u.o

rg/M

ain

_Page

Page 112: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

113

/12

1

●W

ork

ma

n: (w

ork

loa

d m

ana

ge

me

nt)

It a

ims to p

rovid

e h

igh-leve

l re

sou

rce a

lloca

tion a

nd

man

agem

ent im

ple

me

nte

d a

s a

lib

rary

but pro

vid

es b

ind

ings for

more

lan

guages (

dep

ends o

n th

e G

Obje

ct

fram

ew

ork

; a

llow

s a

ll

the lib

rary

AP

Is to b

e e

xp

osed to

non

-C langu

ages lik

e P

erl,

Pyth

on,

JavaS

cript,

Va

la).

http

s://g

itorious.o

rg/w

ork

ma

n/p

age

s/H

om

e

●P

ax

Co

ntr

ola

Gro

up

ian

a –

a d

oc

um

en

t:●T

rie

s to d

efine

pre

ca

utio

ns th

at a

softw

are

or

user

can

ta

ke to a

void

bre

akin

g

or

con

fusin

g o

ther

use

rs o

f th

e c

gro

up

file

syste

m.

http://w

ww

.fre

ed

eskto

p.o

rg/w

iki/S

oftw

are

/syste

md

/Pa

xC

ontr

olG

roup

s

● a

ka

"H

ow

to

beh

av

e n

icely

in

th

e c

gro

up

fs t

rees

"

Page 113: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

114

/12

1

No

te:

in th

is p

rese

nta

tion

, w

e r

efe

r to

tw

o

use

rsp

ace

packa

ge,

ipro

ute

an

d u

til-lin

ux. T

he

exa

mp

les a

re b

ase

d o

n th

e m

ost re

ce

nt git

sou

rce

co

de

of th

ese

pa

cka

ges.

You

ca

n c

heck n

am

espa

ce

s a

nd

cg

rou

ps

sup

po

rt o

n y

ou

r m

achin

e b

y r

unn

ing

:

lxc-c

he

ckco

nfig

(fro

m lxc p

ackag

e)

In F

edo

ra 1

8 a

nd

Ub

un

tu 1

3.0

4, th

ere

is n

o

sup

po

rt fo

r U

se

r N

am

espa

ce

s th

ou

gh

it is

ke

rne

l

3.8

Page 114: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

115

/12

1

●O

n A

ndro

id -

Sa

msu

ng

Min

i G

ala

xy:

–ca

t /p

roc/m

ounts

| g

rep cgro

up

none /

acct

cgro

up r

w,r

ela

tim

e,c

pua

cct

0 0

none /

dev/c

pu

ctl c

gro

up r

w,r

ela

tim

e,c

pu 0

0

Page 115: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

116

/12

1

Lin

ks

Na

mespa

ces in

op

era

tio

n s

erie

s B

y M

ich

ael K

err

isk,

Jan

uary

2013

:

pa

rt 1

: na

me

spa

ce

s o

verv

iew

htt

p://lw

n.n

et/A

rtic

les/5

31114/

pa

rt 2

: th

e n

am

esp

ace

s A

PI

htt

p://lw

n.n

et/A

rtic

les/5

3138

1/

pa

rt 3

: P

ID n

am

esp

aces

htt

p://lw

n.n

et/A

rtic

les/5

3141

9/

pa

rt 4

: m

ore

on P

ID n

am

espa

ces

htt

p://lw

n.n

et/A

rtic

les/5

3274

8/

pa

rt 5

: U

ser

nam

espaces

htt

p://lw

n.n

et/A

rtic

les/5

3259

3/

pa

rt 6

: m

ore

on u

se

r n

am

esp

aces

htt

p://lw

n.n

et/A

rtic

les/5

4008

7/

Page 116: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

117

/12

1

Lin

ks -

contd

Ste

pp

ing

clo

se

r to

pra

ctica

l co

nta

ine

rs:

"syslo

g"

na

me

sp

ace

s

htt

p://lw

n.n

et/A

rtic

les/5

2734

2/

●tr

ee /

sys/fs/c

gro

up/

●D

evic

es im

ple

men

tation.

●S

erg

e H

allyn n

sexec

Page 117: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

118

/12

1

Ca

pabilitie

s -

app

endix

inclu

de/u

ap

i/linux/c

apability.h

CA

P_C

HO

WN

C

AP

_D

AC

_O

VE

RR

IDE

CA

P_D

AC

_R

EA

D_S

EA

RC

H

CA

P_F

OW

NE

R

CA

P_F

SE

TID

CA

P_K

ILL

CA

P_S

ET

GID

CA

P_S

ET

UID

CA

P_S

ET

PC

AP

CA

P_LIN

UX

_IM

MU

TA

BLE

CA

P_N

ET

_B

IND

_S

ER

VIC

E

CA

P_N

ET

_B

RO

AD

CA

ST

CA

P_N

ET

_A

DM

IN

CA

P_N

ET

_R

AW

CA

P_IP

C_

LO

CK

CA

P_IP

C_O

WN

ER

CA

P_S

YS

_M

OD

ULE

CA

P_S

YS

_R

AW

IO

CA

P_S

YS

_C

HR

OO

T

CA

P_S

YS

_P

TR

AC

E

CA

P_S

YS

_P

AC

CT

CA

P_S

YS

_A

DM

IN

CA

P_S

YS

_B

OO

T

CA

P_S

YS

_N

ICE

CA

P_S

YS

_R

ES

OU

RC

E

CA

P_S

YS

_T

IME

CA

P_S

YS

_T

TY

_C

ON

FIG

CA

P_M

KN

OD

CA

P_L

EA

SE

CA

P_A

UD

IT_W

RIT

E

CA

P_A

UD

IT_C

ON

TR

OL

CA

P_S

ET

FC

AP

CA

P_M

AC

_O

VE

RR

IDE

CA

P_M

AC

_A

DM

IN

CA

P_S

YS

LO

G

CA

P_W

AK

E_A

LA

RM

CA

P_B

LO

CK

_S

US

PE

ND

See: m

an 8

setc

ap / m

an 8 g

etc

ap

Page 118: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

119

/12

1

Sum

mary

●N

am

esp

ace

s

–Im

ple

menta

tion

–U

TS

nam

espace

–N

etw

ork

Nam

espaces

●E

xam

ple

–P

ID n

am

espaces

●cg

rou

ps

–C

gro

ups a

nd k

ern

el nam

espaces

–C

GR

OU

PS

VF

S

–C

PU

SE

T

–cpuset exam

ple

–re

lease_agent exam

ple

–m

em

cg

–N

otification A

PI

–devic

es

–Lib

cgro

up

●C

he

ckp

oin

t/R

esta

rt

Page 119: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

12

0/1

21

Lin

ks

cgro

ups k

ern

el m

ailin

g lis

t arc

hiv

e:

htt

p:/

/blo

g.g

mane.o

rg/g

mane.lin

ux.k

ern

el.cgro

ups

cgro

up g

it tre

e:

git:/

/git.k

ern

el.org

/pub/s

cm

/lin

ux/k

ern

el/git/t

j/cgro

up.g

it

Page 120: Resource management: Linux kernel Namespaces and cgroupsrich/class/old.cs290/papers/... · 2014-05-15 · The presentation deals with two Linux process resource management solutions:

htt

p:/

/ram

iro

se.w

ix.c

om

/ram

iro

sen

12

1/1

21

Than

k y

ou

!


Recommended