+ All Categories
Home > Technology > Linux Kernel Library - Reusing Monolithic Kernel

Linux Kernel Library - Reusing Monolithic Kernel

Date post: 10-Feb-2017
Author: hajime-tazaki
View: 382 times
Download: 17 times
Share this document with a friend
Embed Size (px)
of 49 /49
1 Linux Kernel Library: Reusing Monolithic Kernel Hajime Tazaki IIJ Innovation Institute 2016/07 AIST seminar vol.2
  • 1

    Linux Kernel Library: ReusingMonolithic Kernel

    Hajime TazakiIIJ Innovation Institute


    AIST seminar vol.2

  • 2 . 1

    LKL in a nutshellLinux kernel library

    a library of LinuxOctavian Purdila (Intel)'s work (since 2007?)Proposed on LKML (Nov. 2015)

    2809 LoC (as of Apr. 2016)https://lwn.net/Articles/662953/

    Purdila et al., LKL: The Linux kernel library, RoEduNet


  • 2 . 2

    LKL (cont'd)hardware-independent architecture (arch/lkl)provide an interface underlying environment

    outsource dependenciesclock, memory allocation, schedulerrunning on Windows, Linux, FreeBSD

    simplify I/O operation of devicesvirtio host implementationcould use the driver (of virtio) in Linux

    Purdila et al., LKL: The Linux kernel library,RoEduNet 2010.

  • 2 . 3

    Benefitless ossication of new features

    operating system personalityuserspace library has less deployment cost

    Well-matured code base(e.g.) Linux kernel running in userspacesmall kernel, a bunch of librarybut in a dierent shape

  • Any problem in computer science can be solved withanother level of indirection.

    (Wheeler and/or Lampson)

    img src: https://www.ickr.com/photos/thomasclaveirole/305073153

  • 2 . 4

    2 . 5

    What is reusing monolithic kernel ?Anykernel: originally in NetBSD rump kernel

    We dene an anykernel to be an organization ofkernel code which allows the kernel's unmodied

    drivers to be run in various congurations such asapplication libraries and microkernel style servers,and also as part of a monolithic kernel. -- Kantee


    Using (unmodied) high-quality code base of monolithic kernelon dierent environment in dierent shapeby gluing additional stus

  • 2 . 6

    2 . 7

    (a bitof)

    Historyrump: 2007 (NetBSD)LKL: 2007 (Linux)DCE/LibOS: 2008 (Linux/FreeBSD)LibOS/LKL revival: 2015

    LibOS merged to LKL

  • http://news.mynavi.jp/news/2015/03/25/285/https://news.ycombinator.com/item?id=9259292


  • 2 . 8

    2 . 9

    LKL v.s. LibOS



  • LKL v.s. LibOS (cont'd)LoC:

    arch/lkl (LKL) < arch/lib (LibOS)di: the amount of stub code

    commonsno modication to the original Linux codedescription of kernel context (by POSIX thread)outsourced resources (clock, memory, scheduler)CPU independent architecture

    disLibOS: implemented with higher API (timer, irq, kthread) by pthreadLKL: implement IRQ, kthread, timer with pthread in lower layer

  • 2 . 103 . 1


  • 2 . 10

    3 . 2


    1. Host backend (host_ops)2. CPU independent arch. (arch/lkl)3. Application interface

  • 1. host backendenvironment dependent part

    unify an interface across dierent platforms(rump-hypercall like)

    device interface with Virtioblock device disk imagenetworking TAP, raw socket, DPDK, VDE

  • 3 . 33 . 4

    2. CPU independent architecturearchitecture (arch/lkl)

    transparent architecture bind (as CPU arch)

    require no modication to the other

    2800 LoCthread information (struct thread_info)irq, timer, syscall handleraccess to underlying layer by host_ops

  • 3 . 3

    3 . 5

    3. Application interface

    1. use exposed API (LKL syscall)2. use host libc (LD_PRELOAD)3. extend (alternative) libc

  • 3 . 6

    API 1: use exposed API (LKLsyscall)

    call entry points of LKL kernellkl_sys_open(), lkl_sys_socket()

    almost same as ordinal syscallsreturn value, errno notication are dierent

    can use LKL syscall and host syscall simultaneously

    read ext4 le by lkl_sys_read() => write into host (Windows) by write()

  • 3 . 7

    API 2: hijack host standard librarydynamically replace symbols of host syscalls (of libc)

    LD_PRELOADsocket() => lkl_sys_socket()

    can use host binary (executable) as-islimitation of replaceable symbolsneeds syscall translation on non-linux host

  • 3 . 8

    API 3: extend (alternative) libconly call LKL syscall with our own libcalso introduce as a virtual CPU architecturea program can link this instead of host libc

    can't access to (underlying) host resource directly via this lkl syscall

    as a patch for musl libc

  • 3 . 9

    Usecase (applications)Use Case 1: instant kernel bypassUse Case 2: programs reusing kernel code in userspaceUse Case 3: unikernel

  • 3 . 10

    Use Case 1: instant kernel bypasssyscall redirection by LD_PRELOADcan use both LKL and host syscalls

    new feature without touching host kernel


  • 3 . 11

    Use Case 2: programs reusingkernel code in userspace

    use kernel code without portingmount a lesystem w/o root privilege

    can use both LKL and host syscalls

    e.g., access to disk image of ext4 format on Windows1. open disk image (CreateFile())2. Mount (lkl_sys_mount())3. read a le in the disk image (lkl_sys_read())4. write a le to windows side (WriteFile())

  • 3 . 12

    Use Case 3: Unikernelsingle-application contained LKL

    python + LKL, nginx + LKLonly LKL syscalls available

    musl libc extensionrump hypcall (frankenlibc)

    running on non-OS environment(on Xen Mini-OS via rumprun)

    Work in progress

    - http://www.linux.com/news/enterprise/cloud-computing/751156-are-cloud-operating-


  • 3 . 13

    demos with linux kernel library

    Unikernel on Linux (ping6 commandembedded kernel library)

    Unikernel on qemu-arm (helloworld)

  • 4 . 1

    Kernel bypass/userspacenetworking

  • 4 . 2

    Network StackWhy in kernel space ?

    the cost of packet was expensive at the era ('70s)now much cheaper

    Getting fat (matured) after decades

    code path is longer (and slower)hard to add new featuresfaced unknown issues

    img src: http://www.makelinux.net/kernel_map/

  • 4 . 3

    Alternate network stackslwip (2002~)Arrakis [OSDI '14]IX [OSDI '14]MegaPipe [OSDI '12]mTCP [NSDI '14]SandStorm [SIGCOMM '14]uTCP [CCR '14]rumpkernel [ATC '09]FastSocket [ASPLOS '16]SolarFlare (2007~?)StackMap [ATC '16]libuinet (2013~)SeaStar (2014~)Snabb Switch (2012~)

  • 4 . 4

    MotivationsSocket API sucks

    StackMap, MegaPipe, uTCP, SandStorm, IXNew API: no benet with existing applications

    Network stack in kernel space sucksFastSocket, mTCP, lwip (SolarFlare?)

    Compatibility is (also) importantrumpkernel, libuinet, Arrakis, IX, SolarFlare

    Existing programming model sucksSeaStar

  • 4 . 5

    Techniquesbatching (syscall/NIC access)

    Arrakis, IX, MegaPipe, mTCP, SandStorm, uTCPUtilize feature-rich kernel stack

    rumpkernel, fastsocket, StackMapPorting to userspace stack

    libuinet, SandStormKernel bypass (userspace network stack)

    mTCP, SandStorm, uTCP, rumpkernel, libuinet, lwip, SeaStarbypass technique itself

    netmap, PF_RING, raw socket, Intel DPDKConnection locality (multi-core scalability)

    SeaStar, MegaPipe, mTCP, fastsocket, .....

  • 4 . 6

    ImplementationFull scratch

    lwip (Arrakis, IX, SolarFlare?), mTCP, uTCP, SeaStarPorting based

    libuinet, SandStormNew API

    MegaPipe, StackMapAnykernel

    rumpkernel, (LKL)

  • 4 . 7

    What's still missing ?some solves problems by specialization

    avoiding generality taxperformance w/ specialization v.s. more features w/ generalizatione.g., less TCP stack features, new API breaks existing applicationssupport.

    specialized v.s. generalizedgeneralization often involves indirectionindirection usually introduces complexity (Wheeler/Lampson)

    performant and generalized ?

  • 5 . 1

    Performance study

  • 5 . 2

    ConditionsThinkStation P310 x2

    CPU: Intel Core i7-6700 CPU @ 3.40GHz (8 cores)Memory: 32GBNIC: X540-T2

    Linux 4.4.6-301 (x86_64) on Fedora 23Linux bridge (X540 + tap/raw socket)no DPDK... can't with hijack, etc

    netperf (git ~v2.7.0)netserver (native)netperf (varied)

  • 5 . 3

    Conditions (cont'd)combinations

    netperf (sendmmsg) + host stack (native)+ hijack library, native thread (hijack)+ frankenlibc/lkl, green thread (lkl-musl)netperf (sendmmsg) + lkl extension + frankenlibc (lkl-musl (skb prealloc))

    pinned a processorusing taskset command

    disable all ooad features (tso/gso/gro, rx/tx cksum)

  • TCP_RR (netperf)

  • 5 . 4

    UDP_STREAM (netperf)

  • 5 . 5

    UDP_STREAM (pps, netperf)

  • 5 . 6

    TCP_STREAM (netperf)

  • 5 . 7

    5 . 8

    (ref.) LibOS results (as of Feb.2015)

    1024 bytes UDP, own-crafted tool


  • 5 . 9

    Observations (of benchmark)Native thread vs Green thread

    better TCP_RR w/ native thread (pthread)better TCP_STREAM/UDP_STREAM w/ green thread???

    avoiding dynamic allocation contributes a lotpenalized over MTU-sized payload on host stack (?)

  • 6 . 1

    SummaryMorphing monolithic kernel into an AnykernelVarious use cases

    Userspace network stack (kernel bypass)Unikernel

    Performance study in progress


  • 6 . 2

    ReferenceLinux Kernel Library

    Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.

    Rumpkernel (dissertation)Kantee, Flexible Operating System Internals: The Design andImplementation of the Anykernel and Rump Kernels, Ph.D Thesis,2012

    Linux LibOS in generalTazaki et al. Direct Code Execution: Revisiting Library OSArchitecture for Reproducible Network Experiments, CoNEXT 2013

    (LibOS in general)



  • 7 . 1


  • 7 . 4

    Recent Updates

  • 7 . 5

    Updates (diff to lkl)(musl) libc integrationrump hypercall interface

    via frankenlibc tools (for POSIX environment)via rumprun framework (for baremetall/xen/kvm environment)

    more applicationsnetperf (signal handling, etc)nginxghc (Haskell runtime)

    performance study

  • 7 . 6

    libc integrationstandard lib for LKL

    all syscall direct to LKLapplication can use LKL transparently no special modications or hijack needed

    based on musl libcintroduce new (sub) architecture lkl

  • rump hypercall interfacereplacement of LKL host_ops

    or yet-another new host environment (rump)

    has two thread primitives

    pthread-based (as LKL does)ucontext-based (more ecient on non-MP)

    can reduce

    the eort of host_ops maintainancecomplexity of tall abstraction turtle

  • 7 . 77 . 8

    rump hypcall (cont'd)integration of

    libc (musl for LKL, netbsd libc for rumpkernel)rump hypcall (on linux, freebsd, netbsd, qemu-arm, spike)host (platform) support code

    frankenlibchas two namespaced libc(s)hyper call implementation can use libc

    providesa libc.across-build toolchains (rumprun-cc, etc)

  • 7 . 7

    7 . 9



    execution (with rexec launcher)


    rexec executable [disk image le] [NIC] -- [executable specic options]

  • 7 . 10