+ All Categories
Home > Software > eBPF Trace from Kernel to Userspace

eBPF Trace from Kernel to Userspace

Date post: 16-Apr-2017
Category:
Upload: suselab
View: 1,149 times
Download: 8 times
Share this document with a friend
74
eBPF Trace from Kernel to Userspace Gary Lin SUSE Labs Software Engineer Technology Sharing Day 2016
Transcript
Page 1: eBPF Trace from Kernel to Userspace

eBPFTrace from Kernel to Userspace

Gary LinSUSE LabsSoftware Engineer

Technology Sharing Day

2016

Page 2: eBPF Trace from Kernel to Userspace

Tracer

Page 3: eBPF Trace from Kernel to Userspace

tick_nohz_idle_enterset_cpu_sd_state_idleup_write__tick_nohz_idle_enterktime_getuprobe_mmapread_hpetvma_set_page_protvma_wants_writenotify

rcu_needs_cpufputget_next_timer_interrupt_raw_spin_lockhrtimer_get_next_event_raw_spin_lock_irqsave_raw_spin_unlock_irqrestoresyscall_trace_leave _raw_write_unlock_irqrestore__audit_syscall_exitpath_putdputmntput

up_write

rax: 0x0000000000000000rbx: 0xffff88012b5a5a28rcx: 0xffff8800987c18e0rdx: 0x0000000000000000rsi: 0xffff88012b439f20rdi: 0xffff88012b464628rbp: 0xffff8800959e3d98

Page 4: eBPF Trace from Kernel to Userspace

kprobe

Kernel

Userspace

uprobe

/sys/kernel/debug/tracing/kprobe_events

/sys/kernel/debug/tracing/uprobe_events

Page 5: eBPF Trace from Kernel to Userspace

eBPF

Page 6: eBPF Trace from Kernel to Userspace

BPF?

Page 7: eBPF Trace from Kernel to Userspace

Berkeley Packet Filter

Page 8: eBPF Trace from Kernel to Userspace

BPF

No RedBPF Program

Page 9: eBPF Trace from Kernel to Userspace

The BSD Packet Filter:A New Architecture for User-level

Packet CaptureDecember 19, 1992

Page 10: eBPF Trace from Kernel to Userspace

SCO lawsuit, August 2003

Page 11: eBPF Trace from Kernel to Userspace

Old

Page 12: eBPF Trace from Kernel to Userspace

Stable

Page 13: eBPF Trace from Kernel to Userspace

BPF ASM

ldh [12]jne #0x800, dropldb [23]jneq #1, drop# get a random uint32 numberld randmod #4jneq #1, dropret #-1drop: ret #0

Page 14: eBPF Trace from Kernel to Userspace

BPF Bytecode

struct sock_filter code[] = { { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 8, 0x000086dd }, { 0x30, 0, 0, 0x00000014 }, { 0x15, 2, 0, 0x00000084 }, { 0x15, 1, 0, 0x00000006 }, { 0x15, 0, 17, 0x00000011 }, { 0x28, 0, 0, 0x00000036 }, { 0x15, 14, 0, 0x00000016 }, { 0x28, 0, 0, 0x00000038 }, { 0x15, 12, 13, 0x00000016 }, ...};

Page 15: eBPF Trace from Kernel to Userspace

Virtual Machinekind of

Page 16: eBPF Trace from Kernel to Userspace

BPF JIT

Page 17: eBPF Trace from Kernel to Userspace

BPFBytecode

NativeMachine

CodeBPF JIT

Page 18: eBPF Trace from Kernel to Userspace

$ find arch/ -name bpf_jit*arch/sparc/net/bpf_jit_comp.carch/sparc/net/bpf_jit_asm.Sarch/sparc/net/bpf_jit.harch/arm/net/bpf_jit_32.carch/arm/net/bpf_jit_32.harch/arm64/net/bpf_jit_comp.carch/arm64/net/bpf_jit.harch/powerpc/net/bpf_jit_comp.carch/powerpc/net/bpf_jit_asm.Sarch/powerpc/net/bpf_jit.harch/s390/net/bpf_jit_comp.carch/s390/net/bpf_jit.Sarch/s390/net/bpf_jit.harch/mips/net/bpf_jit.carch/mips/net/bpf_jit_asm.Sarch/mips/net/bpf_jit.harch/x86/net/bpf_jit_comp.carch/x86/net/bpf_jit.S

Page 19: eBPF Trace from Kernel to Userspace

Stable and Efficient

Page 20: eBPF Trace from Kernel to Userspace

eBPF

Page 21: eBPF Trace from Kernel to Userspace

Extended BPF

Page 22: eBPF Trace from Kernel to Userspace

eBPF

userspacekernel

eBPFProgramBPF_PROG_LOAD

At most4096instructions

Page 23: eBPF Trace from Kernel to Userspace

Extended RegisterseBPF VerifiereBPF MapProbe Event

Page 24: eBPF Trace from Kernel to Userspace

Extended RegisterseBPF VerifiereBPF MapProbe Event

Page 25: eBPF Trace from Kernel to Userspace

Classic BPF: 32 bitExtended BPF: 64 bit

Page 26: eBPF Trace from Kernel to Userspace

Classic BPF: A, X (2)Extended BPF: R0 – R9 (10)

R10 (read-only)

Page 27: eBPF Trace from Kernel to Userspace

For x86_64 JIT

R0 → raxR1 → rdiR2 → rsiR3 → rdxR4 → rcxR5 → r8R6 → rbxR7 → r13R8 → r14R9 → r15R10 → rbp

Page 28: eBPF Trace from Kernel to Userspace

BPF Calling Convention

● R0

Return value from in-kernel function, and exit value for eBPF program

● R1 – R5

Arguments from eBPF program to in-kernel function

● R6 – R9

Callee saved registers that in-kernel function will preserve

● R10

Read-only frame pointer to access stack

Page 29: eBPF Trace from Kernel to Userspace

Extended RegisterseBPF VerifiereBPF MapProbe Event

Page 30: eBPF Trace from Kernel to Userspace

Two-Step Verification

Page 31: eBPF Trace from Kernel to Userspace

Step 1

Directed Acyclic GraphCheck

Page 32: eBPF Trace from Kernel to Userspace

Loops

Unreachable Instructions

Page 33: eBPF Trace from Kernel to Userspace

Loops

Unreachable Instructions

Page 34: eBPF Trace from Kernel to Userspace

Step 2

Simulate the Execution

Page 35: eBPF Trace from Kernel to Userspace

Read a never-written register

Do arithmetic of two valid pointer

Load/store registers of invalid types

Read stack before writing data into stack

Page 36: eBPF Trace from Kernel to Userspace

Read a never-written register

Do arithmetic of two valid pointer

Load/store registers of invalid types

Read stack before writing data into stack

Page 37: eBPF Trace from Kernel to Userspace

Extended RegisterseBPF VerifiereBPF MapProbe Event

Page 38: eBPF Trace from Kernel to Userspace

eBPF

userspacekernel

UserProgram

Map BPF_MAP_*

Page 39: eBPF Trace from Kernel to Userspace

eBPF Map Types

● BPF_MAP_TYPE_HASH● BPF_MAP_TYPE_ARRAY● BPF_MAP_TYPE_PROG_ARRAY● BPF_MAP_TYPE_PERF_EVENT_ARRAY

Page 40: eBPF Trace from Kernel to Userspace

eBPF Map Syscalls

● BPF_MAP_CREATE● BPF_MAP_LOOKUP_ELEM● BPF_MAP_UPDATE_ELEM● BPF_MAP_DELETE_ELEM● BPF_MAP_GET_NEXT_KEY

Page 41: eBPF Trace from Kernel to Userspace

Extended RegisterseBPF VerifiereBPF MapProbe Event

Page 42: eBPF Trace from Kernel to Userspace

New ioctl request

PERF_EVENT_IOC_SET_BPF

Page 43: eBPF Trace from Kernel to Userspace

Kprobe

Page 44: eBPF Trace from Kernel to Userspace

BPF_PROG_LOAD

User Program

eBPF

userspace

kernel

KernelProgram

kprobe

Eventfd

fd

PERF_EVENT_IOC_SET_BPF

fd

Attach

Page 45: eBPF Trace from Kernel to Userspace

Registration

perf_tp_event_init() kernel/events/core.cperf_trace_init() kernel/trace/trace_event_perf.cperf_trace_event_init() kernel/trace/trace_event_perf.cperf_trace_event_reg() kernel/trace/trace_event_perf.c

ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL);

kprobe_register() kernel/trace/trace_kprobe.cenable_trace_kprobe() kernel/trace/trace_kprobe.cenable_kprobe() kernel/kprobes.c

Page 46: eBPF Trace from Kernel to Userspace

Attach

perf_ioctl() kernel/events/core.c_perf_ioctl() kernel/events/core.c

case PERF_EVENT_IOC_SET_BPF: return perf_event_set_bpf_prog(event, arg);

perf_event_set_bpf_prog() kernel/events/core.c

prog = bpf_prog_get(prog_fd); event->tp_event->prog = prog;

Page 47: eBPF Trace from Kernel to Userspace

Dispatch Event

kprobe_dispatcher() kernel/trace/trace_kprobe.ckprobe_perf_func() kernel/trace/trace_kprobe.c

if (prog && !trace_call_bpf(prog, regs)) Return;

trace_call_bpf() kernel/trace/bpf_trace.cBPF_PROG_RUN() include/linux/filter.h__bpf_prog_run() kernel/bpf/core.c

Page 48: eBPF Trace from Kernel to Userspace

kfree_skb(struct sk_buff *skb){ if (unlikely(!skb)) return; ….}

kprobe

eBPF

BPF bytecode Read Map

BPF bytecode Map

BPF_PROG_LOAD BPF_MAP_*

userspace

kernel

bpf_tracer.c

Page 49: eBPF Trace from Kernel to Userspace

Uprobe

Page 50: eBPF Trace from Kernel to Userspace

BPF_PROG_LOAD

User Program

eBPF

userspace

kernel

KernelProgram

uprobe

Eventfd

fd

PERF_EVENT_IOC_SET_BPF

fd

Attach

Page 51: eBPF Trace from Kernel to Userspace

__libc_malloc(size_t *bytes){ arena_lookup(ar_ptr);

arena_lock(ar_ptr, bytes); ….}

uprobe

eBPF

BPF bytecode

BPF bytecode

userspace

kernel

bpf_tracer.c

glibc

Page 52: eBPF Trace from Kernel to Userspace

How to use eBPF?

Page 53: eBPF Trace from Kernel to Userspace

Linux Kernel >= 4.1

Page 54: eBPF Trace from Kernel to Userspace

Kernel Config

● CONFIG_BPF=y● CONFIG_BPF_SYSCALL=y● CONFIG_BPF_JIT=y● CONFIG_HAVE_BPF_JIT=y● CONFIG_BPF_EVENTS=y

Page 55: eBPF Trace from Kernel to Userspace

BPF ASM

Page 56: eBPF Trace from Kernel to Userspace

BPF ASMRestricted C

Page 57: eBPF Trace from Kernel to Userspace

LLVM >= 3.7

Page 58: eBPF Trace from Kernel to Userspace

clang:llc:

--emit-llvm--march=bpf

Page 59: eBPF Trace from Kernel to Userspace

C codeLLVM

IR BitcodeBPF Bytecodeclang llc

Page 60: eBPF Trace from Kernel to Userspace

User Program

eBPF

userspace

kernel

eBPF MAP

KernelProgram

As simpleas possible

Whatever you want

Page 61: eBPF Trace from Kernel to Userspace

BPF Compiler Collection

Page 62: eBPF Trace from Kernel to Userspace

obs://Base:System/bcc

Page 63: eBPF Trace from Kernel to Userspace

C & Python Library

Built-in BPF compiler

Page 64: eBPF Trace from Kernel to Userspace

Hello World

from bcc import BPF

bpf_prog="""void kprobe__sys_clone(void *ctx) { bpf_trace_printk(“Hello, World\\n”);}"""

BPF(text=bpf_prog).trace_print()

Page 65: eBPF Trace from Kernel to Userspace

Access Map

In bitehist.c:

BPF_HISTOGRAM(dist);dist.increment(bpf_log2l(req->__data_len / 1024));

In bitehist.py:

b = BPF(src_file = "bitehist.c")b["dist"].print_log2_hist("kbytes")

Page 66: eBPF Trace from Kernel to Userspace

Access Map (Cont’)

# ./bitehist.pyTracing... Hit Ctrl-C to end.^C kbytes : count distribution 0 -> 1 : 8 |****** | 2 -> 3 : 0 | | 4 -> 7 : 51 |****************************************| 8 -> 15 : 8 |****** | 16 -> 31 : 1 | | 32 -> 63 : 3 |** | 64 -> 127 : 2 |* |

Page 67: eBPF Trace from Kernel to Userspace

memleak.py

if not kernel_trace: print("Attaching to malloc and free in pid %d," "Ctrl+C to quit." % pid) bpf_program.attach_uprobe(name="c", sym="malloc", fn_name="alloc_enter", pid=pid) bpf_program.attach_uretprobe(name="c", sym="malloc", fn_name="alloc_exit", pid=pid) bpf_program.attach_uprobe(name="c", sym="free", fn_name="free_enter", pid=pid)else: print("Attaching to kmalloc and kfree, Ctrl+C to quit.") bpf_program.attach_kprobe(event="__kmalloc", fn_name="alloc_enter") bpf_program.attach_kretprobe(event="__kmalloc", fn_name="alloc_exit") bpf_program.attach_kprobe(event="kfree", fn_name="free_enter")

Page 68: eBPF Trace from Kernel to Userspace

memleak.py (alloc_enter)

BPF_HASH(sizes, u64);BPF_HASH(allocs, u64, struct alloc_info_t);

int alloc_enter(struct pt_regs *ctx, size_t size) { ... u64 pid = bpf_get_current_pid_tgid(); u64 size64 = size; sizes.update(&pid, &size64); ...}

Page 69: eBPF Trace from Kernel to Userspace

memleak.py (alloc_exit)

BPF_HASH(sizes, u64);BPF_HASH(allocs, u64, struct alloc_info_t);

int alloc_exit(struct pt_regs *ctx) { u64 address = ctx->ax; u64 pid = bpf_get_current_pid_tgid(); u64* size64 = sizes.lookup(&pid); struct alloc_info_t info = {0};

if (size64 == 0) return 0; // missed alloc entry

info.size = *size64; sizes.delete(&pid);

info.timestamp_ns = bpf_ktime_get_ns(); info.num_frames = grab_stack(ctx, &info) - 2; allocs.update(&address, &info); ...}

Page 70: eBPF Trace from Kernel to Userspace

memleak.py (free)

BPF_HASH(sizes, u64);BPF_HASH(allocs, u64, struct alloc_info_t);

int free_enter(struct pt_regs *ctx, void *address){ u64 addr = (u64)address; struct alloc_info_t *info = allocs.lookup(&addr); if (info == 0) return 0;

allocs.delete(&addr); ...}

Page 71: eBPF Trace from Kernel to Userspace

Demo

Page 72: eBPF Trace from Kernel to Userspace

Question?

Page 73: eBPF Trace from Kernel to Userspace

ThankYou

Page 74: eBPF Trace from Kernel to Userspace

References

● Documentation/networking/filter.txt

● http://www.brendangregg.com/blog/2015-05-15/ebpf-one-small-step.html

● https://suchakra.wordpress.com/2015/05/18/bpf-internals-i/

● https://suchakra.wordpress.com/2015/08/12/bpf-internals-ii/

● https://lkml.org/lkml/2013/9/30/627

● https://lwn.net/Articles/612878/

● https://lwn.net/Articles/650953/

● https://github.com/iovisor/bcc


Recommended