of 45
7/30/2019 Ftrace Tutorial
1/45
Ftrace Tutorial
Steven Rostedt ([email protected])
mailto:[email protected]:[email protected]7/30/2019 Ftrace Tutorial
2/45
Introduction
Kernel internal tracer Derived from -rt patch Latency Tracer Plugin tracers
ftrace : function tracer irqsoff : interrupt disabled latency wakeup : latency of highest priority task to
wake up
sched_switch: task context switches (more)
Ring buffer Saved traces (snap shots)
used to save maximum latency traces
7/30/2019 Ftrace Tutorial
3/45
The Debug File System
/sys/kernel/debug I prefer:
mkdir /debug mount -t debugfs nodev /debug
/etc/fstab debugfs /sys/kernel/debug debugfs defaults 0 0 debugfs /debug debugfs defaults 0 0
7/30/2019 Ftrace Tutorial
4/45
/debug/tracing
available_tracers current_tracer tracing_enabled trace latency_trace trace_pipe iter_ctrl tracing_max_latency tracing_cpumask trace_entries
7/30/2019 Ftrace Tutorial
5/45
Selecting a tracer
wakeup preemptirqsoff preemptoff irqsoff ftrace sysprof sched_switch none
wakeup
7/30/2019 Ftrace Tutorial
6/45
The none tracer
No tracer selected none is special
it is not a tracer echo none > /debug/tracing/current_tracer
7/30/2019 Ftrace Tutorial
7/45
Starting a trace
do not relay on tracing being enabled echo 1 > /debug/tracing/tracing_enabled
note, make sure to have a space between the '1'
and the '>'. This has burnt many a kernelprogrammer.
The enabled stays across tracers. echo 1 > /debug/tracing/tracing_enabled echo ftrace > /debug/tracing/current_tracer echo irqsoff > /debug/tracing/current_tracer
7/30/2019 Ftrace Tutorial
8/45
Stopping a trace
echo 0 > /debug/tracing/tracing_enabled do not forget that space!
Or in a program:int trace_fd;
[...]
int main(int argc, char *argv[]) {
[...]
trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
[...]
if (condition_hit()) {
write(trace_fd, "0", 1);
}
[...]
}
7/30/2019 Ftrace Tutorial
9/45
Reading the Output
latency_trace trace trace_pipe
7/30/2019 Ftrace Tutorial
10/45
Latency Trace Output
# tracer: irqsoff
#
irqsoff latency trace v1.1.5 on 2.6.26-tip
--------------------------------------------------------------------
latency: 971 us, #3/3, CPU#1 | (M:preem pt VP:0, KP:0, SP:0 HP:0 #P:2)
-----------------
| task: swapper-0 (uid:0 nice:20 policy:0 r t_prio:0)
-----------------
=> started at: acpi_os_acq uire_lock
=> ended at: cpuidle_idle_call
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
-0 1d..1 1us!: _spin_lock_irqsave (acpi_os_acq uire_lock)
-0 1d..1 971us : acpi_idle_enter_bm (cpuidle_idle_call)
-0 1d..2 972us : trace_hardirqs_on (cpuidle_idle_call)
7/30/2019 Ftrace Tutorial
11/45
Various outputs
-0 1d.h2 1335164us : tick_sched_timer (__r un_hrtimer)
-0 0.Ns2 1386686us+: _spin_lock_irq (run_timer_softirq)
-0 1d.H4 1388217us : ktime_get_ts (ktime_get)
bash-3498 1.... 1576794us : rw_verify_area (vfs_write)
bash-3498 1d..4 120768us+: 0:140:R + 3096:120:S gnome-ter minal
bash-3498 1d..3 120796us!: 3498:120:S ==> 0:140:R
7/30/2019 Ftrace Tutorial
12/45
trace output
-0 [01] 1977.853298: read_hpet
7/30/2019 Ftrace Tutorial
13/45
iter_ctrl
print-parent sym-offset sym-addr verbose raw hex binary block stacktrace sched-tree
7/30/2019 Ftrace Tutorial
14/45
Using iter_ctrl
-0 [01] 2975.463936: tick_program_event
7/30/2019 Ftrace Tutorial
15/45
The tracers
sched_switch ftrace wakeup irqsoff preemptoff preemptirqsoff
7/30/2019 Ftrace Tutorial
16/45
Available Tracers?
wakeup preemptirqsoff preemptoff irqsoff ftrace sysprof sched_switch none
7/30/2019 Ftrace Tutorial
17/45
sched_switch
Traces task wakeups Traces task context switches
bash-3498 [01] 5459.824565: 0:140:R + 7971:120:R
-0 [00] 5459.824836: 0:140:R ==> 7971:120:R
bash-3498 [01] 5459.824984: 3498:120:S ==> 0:140:R
-0 [01] 5459.825342: 0:140:R ==> 7971:120:R
ls-7971 [00] 5459.825380: 7971:120:R + 3: 0:S
ls-7971 [00] 5459.825384: 7971:120:R ==> 3: 0:R
migration/0-3 [00] 5459.825401: 3: 0:S ==> 0:140:Rls-7971 [01] 5459.825565: 7971:120:R + 598:115:S
7/30/2019 Ftrace Tutorial
18/45
stacktrace
iter_ctrl that effects the tracing itself
bash-3498 [01] 6216.772637: 0:140:R + 8495:120:R
bash-3498 [01] 6216.772639: do_fork
7/30/2019 Ftrace Tutorial
19/45
ftrace - function tracer
Traces at every non inline function Other functions not traced
annotated with notrace
Makefile with CFLAGS_REMOVE_... = -pg Must have /proc/sys/kernel/ftrace_enabled=1 Appears in most other tracers Very verbose
init-1 [00] 6710.079562: _spin_lock
7/30/2019 Ftrace Tutorial
20/45
Latency Tracers
Stores the last maximum latency trace wakeup : scheduling latency of RT tasks irqsoff : interrupts off preemptoff : preemption off preemptirqsoff: interrupts and/or preemption
off tracing_max_latency
7/30/2019 Ftrace Tutorial
21/45
wakeup - sched latency
Only traces RT tasks use LatencyTop for non-RT tasks
Records and traces the maximum latency an
RT task took from wake up to schedule Remember to reset tracing_max_latency
7/30/2019 Ftrace Tutorial
22/45
Wakeup withoutfunction tracing
# tracer: wakeup
#
wakeup latency trace v1.1.5 on 2.6.26-tip
--------------------------------------------------------------------
latency: 9 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
-----------------
| task: migration/1-7663 (uid:0 nice:-5 policy:1 rt_prio:99)
-----------------
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
usleep-10237 1d..2 2us+: try_to_wake_up (wake_up_process)
usleep-10237 1d..3 9us : schedule (preempt_schedule)
7/30/2019 Ftrace Tutorial
23/45
With function tracing# tracer: wakeup
#
wakeup latency trace v1.1.5 on 2.6.26-tip
--------------------------------------------------------------------
latency: 19 us, #18/18, CPU#0 | (M:preem pt VP:0, KP:0, SP:0 HP:0 #P:2)
-----------------
| task: usleep-10133 (uid:0 nice:0 policy:1 rt_prio:10)
-----------------
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
-0 0d.h4 1us : try_to_wake_up (wake_up_process)
-0 0dNh4 2us : _spin_unlock_irqrestore (try_to_wake_up)
-0 0dNh3 3us : _spin_lock (__run_hrtimer)
-0 0dNh4 4us : _spin_unlock (hrtimer_interrupt)
-0 0dNh3 5us : tick_program_event (hrtimer_interrupt)
[...]
-0 0.N.2 14us : _spin_lock_irqsave (hr tick_set)
-0 0dN.3 15us+: _spin_unlock_irqrestore (hr tick_set)-0 0dN.2 16us : _spin_lock (schedule)
-
7/30/2019 Ftrace Tutorial
24/45
irqsoff
local_irq_save(flags);[...]preempt_disabled();[...]local_irq_restore(flags);[...]preempt_enabled();
7/30/2019 Ftrace Tutorial
25/45
preemptoff
local_irq_save(flags);[...]preempt_disabled();[...]local_irq_restore(flags);[...]preempt_enabled();
7/30/2019 Ftrace Tutorial
26/45
preemptirqsoff
local_irq_save(flags);[...]preempt_disabled();[...]local_irq_restore(flags);[...]preempt_enabled();
7/30/2019 Ftrace Tutorial
27/45
trace_entries
Not enough data recorded Too much data recorded Run-time configurable Must be done with none tracer or it will give
an -EBUSY Number is number of entries, but the buffers
are allocate via pages. If more entries can fit on a page that was
allocated to handle requested entries, theremaining page will be filled with entries
7/30/2019 Ftrace Tutorial
28/45
Dynamic Ftrace(the fun begins!)
Produces non-measurable overhead Requires kernel thread ftraced to check for
more updates Calls kstop_machine to execute text
modification Not safe to modify code text in SMP environment
/debug/tracing/ftraced_enabled
7/30/2019 Ftrace Tutorial
29/45
How it works?
With the gcc profiler switch -pg Every non-inline function calls mcount
00001adb :
1adb: 55 push %ebp
1adc: 89 e5 mov %esp,%ebp
1ade: 57 push %edi
1adf: 56 push %esi
1ae0: 53 push %ebx
1ae1: 83 ec 1c sub $0x1c,%esp
1ae4: e8 fc ff ff ff call 1ae5 1ae5: R_386_PC32 mcount
1ae9: 89 c3 mov %eax,%ebx
1aeb: 89 c7 mov %eax,%edi
1aed: 81 e3 00 00 00 02 and $0x2000000,%ebx
1af3: 89 ce mov %ecx,%esi
7/30/2019 Ftrace Tutorial
30/45
Non dynamic i368 mcount
ENTRY(mcount)
cmpl $ftrace_stub, ftrace_trace_function
jnz trace
.globl ftrace_stub
ftrace_stub:
ret
/* taken from glibc */
trace:
pushl %eax
pushl %ecx
pushl %edx
movl 0xc(%esp), %eax
movl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax
call *ftrace_trace_function
popl %edx
popl %ecx
popl %eax
jmp ftrace_stub
END(mcount)
7/30/2019 Ftrace Tutorial
31/45
Dynamic i386 mcount
ENTRY(mcount)
pushl %eax
pushl %ecx
pushl %edx
movl 0xc(%esp), %eax
subl $MCOUNT_INSN_SIZE, %eax
.globl mcount_call
mcount_call:
call ftrace_stub
popl %edx
popl %ecx
popl %eax
ret
END(mcount)
7/30/2019 Ftrace Tutorial
32/45
Call ftrace_record_ip
ENTRY(mcount)
pushl %eax
pushl %ecx
pushl %edx
movl 0xc(%esp), %eax
subl $MCOUNT_INSN_SIZE, %eax
.globl mcount_call
mcount_call:
popl %edx
popl %ecx
popl %eax
ret
END(mcount)
7/30/2019 Ftrace Tutorial
33/45
ftrace_record_ip
ftrace_record_ip
HASHdo_fork
do_fork+0x9
7/30/2019 Ftrace Tutorial
34/45
ftraced
do_fork+0x9
HASHftraced
List
kstop_machine(modify code: nop)
do_fork+0x9
7/30/2019 Ftrace Tutorial
35/45
nop
00001adb :
1adb: 55 push %ebp
1adc: 89 e5 mov %esp,%ebp
1ade: 57 push %edi
1adf: 56 push %esi
1ae0: 53 push %ebx
1ae1: 83 ec 1c sub $0x1c,%esp
1ae4:
1ae9: 89 c3 mov %eax,%ebx
1aeb: 89 c7 mov %eax,%edi
1aed: 81 e3 00 00 00 02 and $0x2000000,%ebx
1af3: 89 ce mov %ecx,%esi
7/30/2019 Ftrace Tutorial
36/45
Starting of ftrace
00001adb :
1adb: 55 push %ebp
1adc: 89 e5 mov %esp,%ebp
1ade: 57 push %edi
1adf: 56 push %esi
1ae0: 53 push %ebx
1ae1: 83 ec 1c sub $0x1c,%esp
1ae4:
1ae5: R_386_PC32
1ae9: 89 c3 mov %eax,%ebx
1aeb: 89 c7 mov %eax,%edi
1aed: 81 e3 00 00 00 02 and $0x2000000,%ebx
1af3: 89 ce mov %ecx,%esi
7/30/2019 Ftrace Tutorial
37/45
ftrace_caller
ENTRY(ftrace_caller)
pushl %eax
pushl %ecx
pushl %edx
movl 0xc(%esp), %eaxmovl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax
.globl ftrace_call
ftrace_call:
call ftrace_stub
popl %edxpopl %ecx
popl %eax
.globl ftrace_stub
ftrace_stub:
ret
END(ftrace_caller)
7/30/2019 Ftrace Tutorial
38/45
Registering an ftrace caller
ENTRY(ftrace_caller)
pushl %eax
pushl %ecx
pushl %edx
movl 0xc(%esp), %eaxmovl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax
.globl ftrace_call
ftrace_call:
call
popl %edxpopl %ecx
popl %eax
.globl ftrace_stub
ftrace_stub:
ret
END(ftrace_caller)
7/30/2019 Ftrace Tutorial
39/45
Selective function tracer
tracing is dynamically enabled have a list of functions that need to be traced Why not filter which functions we trace?
7/30/2019 Ftrace Tutorial
40/45
Picking what functions to trace
/debug/tracing/available_filter_functions /debug/tracing/set_ftrace_filter /debug/tracing/set_ftrace_notrace
7/30/2019 Ftrace Tutorial
41/45
available_filter_functions
filelock_init
__rcu_read_lock
kmem_cache_create
notifier_call_chain
down_write
__rcu_read_unlock
_spin_lock_irq
_spin_unlock_irq
_spin_lock
__kmalloc
7/30/2019 Ftrace Tutorial
42/45
set_ftrace_filter
# tracer: ftrace
## TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
ls-5652 [00] 2320.450897: sys_open
7/30/2019 Ftrace Tutorial
43/45
set_ftrace_notrace
Modify like set_ftrace_filter Acts like a notrace added to the function The function will not be traced even if in the
set_ftrace_filter
7/30/2019 Ftrace Tutorial
44/45
set_ftrace_* wildcards
Prefix: echo 'sys_*' > /debug/tracing/set... Postfix: echo '*lock' > /debug/tracing/set... Included: echo '*device*' > /debug/tracing/set... Anything else:
use grep on available_filter_functions
7/30/2019 Ftrace Tutorial
45/45
Todo:
ftrace dump on OOPS change sleep interval of ftraced thread use CPU clock (aka TSC) for interrupt and
preemption latency traces option to force per CPU trace interleaving
integrity printk like hooks (for debugging purposes
only) Hooks for tuna to show in the oscilloscope