+ All Categories
Home > Technology > Troubleshooting Linux Kernel Modules And Device Drivers

Troubleshooting Linux Kernel Modules And Device Drivers

Date post: 10-May-2015
Category:
Upload: satpal-parmar
View: 16,024 times
Download: 9 times
Share this document with a friend
Description:
Troubleshooting Linux Kernel Modules And Device Drivers
Popular Tags:
35
08/05/2008 - Copyright 2008, The PTR Group, Inc. LinuxWorld-SF-1 Troubleshooting Troubleshooting Linux Kernel Modules Linux Kernel Modules and Device Drivers and Device Drivers Mike Anderson Chief Scientist The PTR Group, Inc. [email protected] Source: www.crobike.de
Transcript
Page 1: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-1

Troubleshooting Troubleshooting Linux Kernel Linux Kernel

Modules and Device Modules and Device DriversDrivers

Mike AndersonChief Scientist

The PTR Group, [email protected]

Source: www.crobike.de

Page 2: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-2

What We’ll Talk AboutWhat We’ll Talk About

How do errors show up in the kernel?Watching kernel/user-space interaction via straceDebugging with printkUsing the /proc file systemUsing the kgdb debuggerDebugging with hardware ala LEDs or a JTAG unit

Page 3: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-3

Challenges of Kernel Challenges of Kernel DebuggingDebugging

There are many features that can make kernel debugging especially difficultOptimizing compilers can rearrange code

• Instruction pointer seems to jump aroundThe use of the MMU can obfuscate addresses

• Physical vs. virtual addressesStartup code is particularly difficult to debug

because of its closeness to the “metal”No equivalent for the user-space gdbserver for

drivers

Early kernel debugging may require hardware assistance

Page 4: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-4

Device Drivers/Kernel ModulesDevice Drivers/Kernel Modules

Assuming that your kernel is otherwise working, most of the problems that you’ll encounter are related to device driversDrivers can either be statically linked or

dynamically loaded to the kernel

A dynamically loaded driver takes the form of a kernel moduleCan be dynamically loaded and unloaded at

kernel run time• Frequently handled by daemons such as udev

Page 5: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-5

Example ModuleExample Module #include <linux/module.h> #include <linux/init.h> #include <linux/kernel.h>

MODULE_LICENSE("GPL and additional rights"); MODULE_AUTHOR( “[email protected]” ); MODULE_DESCRIPTION( “My first driver!” );

int __init mymodule_init_module(void) { printk(KERN_DEBUG “mymodule_init_module() called, "); return 0; }

void __exit mymodule_cleanup_module(void) { printk(KERN_DEBUG “mymodule_cleanup_module() called\

n"); }

module_init(mymodule_init_module); module_exit(mymodule_cleanup_module);

Page 6: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-6

When Things go Wrong…When Things go Wrong…

Problems in device drivers typically manifest themselves in one of three waysKernel panic

• Fatal to the system

Kernel oops• Near fatal to the system

Hardware just doesn’t work correctly• Could be fatal to you!

Source: picasaweb.google.com

Page 7: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-7

Kernel PanicKernel Panic

When the Linux kernel determines that a fatal error has occurred, and no recovery is possible, it “panics”Frequently, an exception in an interrupt

context

Panic outputs a message to the consoleThe output will help you find the source

of the bug

Typically results in a system reboot on an embedded Linux targetOr, blinking keyboard LEDs on some

desktop versions of Linux

Source: regmedia.co.uk

Page 8: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-8

Sample Panic OutputSample Panic OutputEIP: 0060: [<c02b2516>] Not tainted VLIEFLAGS: 00010286 (2.6.18)EIP is at nf_queue+0x16/0x240eax: 00000000 ebx: 00000001 ecx: 00000000 edx: 00000002esi: dece0920 edi: c03a3aa8 ebp: c0277e70 esp: cf165e78ds: 007b es: 007b ss: 0068

Process sh (pid:5278, threadinfo=cf164000 task=dc5b4ab0)Stack: cf165f00 dae29000 00000000 c0277e70 00000001 cf165f00 c03a3aa8 c0277e70c02b1f7c cf165f00 dece0920 00000002 00000001 dae29000 00000000 c0277e70 00000000 dece0920 00000000 da641b40 da54bcde cc318abc c0277c10 00000002

Call Trace:[<c0277e70>] ip_local_deliver_finish+0x0/0x230[<c0277e70>] ip_local_deliver_finish+0x0/0x230… deleted for space …[<c012caa6>] do_softirq+0x26/0x30[<c010528e>] do_IRQ+0x1e/0x30[<c0103ada>] common_interrupt+0x1a/0x20Code: c0 75 ec e9 bd e5 e6 ff 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 57 56 5383 ec 10 8b 54 24 2c 8b 74 24 28 8b 04 9b c0 42 3a c0 <8b> 38 85 ff 0f 84 96 0100 00 86 44 24 2c 8b 7c 24 2c c7 44 24<0>Kernel panic - not syncing: Fatal exception in interrupt

Page 9: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-9

Kernel oopsKernel oopsAn oops message is displayed when a recoverable error has occurred in kernel space: Access to bad address, e.g., through a NULL pointer Illegal or invalid instruction Etc…

The calling user process is killed The system should be considered unstable at this point

The oops message displays: The state of the processor at the time of the fault, including

registers and address of faulting instruction function call stack traceback

The addresses are replaced with symbols if the kallsyms kernel configuration option is selected at kernel compile time

Page 10: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-10

Example OopsExample OopsModules linked in: arc4 ecb crypto_blkcipher ieee80211_crypt_wep vfat fat

ipt_MASQUERADE iptable_nat nf_nat bridge bnep rfcomm l2cap bluetooth ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi sunrpc ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq fuse loop dm_multipath ipv6 ppdev parport_pc parport video 8139cp output ac battery firewire_ohci fglrx(P) firewire_core crc_itu_t 8139too mii ipw2200 ieee80211 ieee80211_crypt snd_intel8x0m asus_laptop snd_intel8x0 button snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event joydev snd_seq pcspkr serio_raw snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm usb_storage iTCO_wdt snd_timer snd iTCO_vendor_support soundcore snd_page_alloc sr_mod cdrom sg dm_snapshot dm_zero dm_mirror dm_mod pata_acpi ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]

Pid: 5488, comm: Xorg Tainted: P (2.6.25.3-18.fc9.i686 #1)EIP: 0060:[<c04d038d>] EFLAGS: 00213246 CPU: 0EIP is at task_has_capability+0x48/0x76EAX: 00000030 EBX: f6852030 ECX: f6a8af28 EDX: 00000000ESI: f36e50e0 EDI: f354fec8 EBP: f354fed4 ESP: f354fe84 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068Process Xorg (pid: 5488, ti=f354f000 task=f35f0e90 task.ti=f354f000)Stack: c06d48b5 f6852030 f35f0e90 00000003 f35f0e90 f6852030 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 f6852030 f35f0e90 f66f8000 f354fee4 c04d03da f35f0e90 f8d52580

Page 11: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-11

Example Oops part 2Example Oops part 2Call Trace: [<c04d03da>] ? selinux_capable+0x1f/0x23 [<c04cbfe9>] ? security_capable+0xc/0xe [<c042de97>] ? __capable+0xb/0x22 [<f8c2d560>] ? firegl_version+0x0/0x1b0 [fglrx] [<c042debe>] ? capable+0x10/0x12 [<f8c2d427>] ? firegl_ioctl+0xe7/0x220 [fglrx] [<c062aa49>] ? mutex_lock+0x1d/0x2d [<c04d0c45>] ? file_has_perm+0x7c/0x85 [<f8c24685>] ? ip_firegl_ioctl+0xe/0x10 [fglrx] [<c048d2c8>] ? vfs_ioctl+0x50/0x69 [<c048d51a>] ? do_vfs_ioctl+0x239/0x24c [<c04d0ddb>] ? selinux_file_ioctl+0xa8/0xab [<c048d56d>] ? sys_ioctl+0x40/0x5b [<c0405bf2>] ? syscall_call+0x7/0xb ====================== ======================Code: 00 89 d0 f3 ab 8b 4d b8 89 d8 b2 04 c1 f8

05 c6 45 bc 03 89 5d c4 89 4d c0 74 16 48 b2 45 74 11 53 68 b5 48 6d c0 e8 11 f3 15 00 <0f> 0b 59 5b eb fe 8b 46 08 83 e3 1f 0f b7 f2 8d 55 bc 88 d9 52

EIP: [<c04d038d>] task_has_capability+0x48/0x76 SS:ESP 0068:f354fe84---[ end trace 9451402cb10d9c54 ]---

Page 12: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-12

Enabling Symbolic Kernel Enabling Symbolic Kernel OutputOutput

Page 13: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-13

Module Debugging TechniquesModule Debugging Techniques

Examine the interaction with the kernel via straceThe next line of defense is printkThere may be additional output you’re not

seeing

Next, we can try adding /proc file system entriesInstrument the driver for debugging

Enable source debugging via kgdbUsing hardware debuggers and “blinky lights”

Page 14: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-14

Using Using stracestrace to Watch System to Watch System CallsCalls

When debugging what appears to be a kernel-space error, it can be helpful to watch the system calls that are made from user-spaceSee what events lead to the error

strace displays all system calls made by a programCan display timestamp information per

system call as well

Page 15: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-15

Using Using stracestrace to Watch System to Watch System Calls #2Calls #2

strace displays each system call’s arguments and return valuesstring arguments are printed – very helpful!errno values displayed symbolically

The program being traced runs normallyNot under control of a debugger

• No need to specially compile the user application

You can attach to a running programAnd trace forked applications as well…

Page 16: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-16

Example strace OutputExample strace Output/ # strace ls /dev/labdev execve("/bin/ls", ["ls", "/dev/labdev"], [/* 8 vars */]) = 0fcntl64(0, F_GETFD) = 0fcntl64(1, F_GETFD) = 0fcntl64(2, F_GETFD) = 0geteuid() = 0getuid() = 0getegid() = 0getgid() = 0brk(0) = 0x1028ad68brk(0x1028bd68) = 0x1028bd68brk(0x1028c000) = 0x1028c000ioctl(1, TIOCGWINSZ or TIOCGWINSZ, {ws_row=0, ws_col=0, ws_xpixel=0, ws_ypixel=0ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0lstat("/dev/labdev", {st_mode=S_IFCHR|0644, st_rdev=makedev(254, 0), ...}) = 0open("/etc/localtime", O_RDONLY) = -1 ENOENT (No such file or directory)lstat("/dev/labdev", {st_mode=S_IFCHR|0644, st_rdev=makedev(254, 0), ...}) = 0fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 64), ...}) = 0ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x300write(1, "\33[1;35m/dev/labdev\33[0m\n", 23/dev/labdev) = 23munmap(0x30000000, 4096) = 0exit(0) = ?

Page 17: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-17

Debugging with Debugging with printkprintkprintk debugging is the debug method preferred by Linus At least, according to his email traffic…

Insert messages to be displayed at points of interest in kernel-space code E.g., printk(KERN_INFO “my_x= %d\n”, my_x); Printk works like printf does in user-space

except that printk can only print integers, strings & addresses

printk can also be called from within ISRsMessages can have “importance” settings that allow filtering Importance is set by prepending a 3-character

string to the output message: “<n>”

Page 18: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-18

Debugging with Debugging with printkprintk #2 #2

The messages are placed in a circular buffer that can be retrieved post mortem if neededThe “importance” string that is prepended to the printk message can be found in include/linux/kernel.h:

#define KERN_EMERG "<0>" /* system is unusable */#define KERN_ALERT "<1>" /* action to be taken immediately */#define KERN_CRIT "<2>" /* critical conditions */#define KERN_ERR "<3>" /* error conditions */#define KERN_WARNING "<4>" /* warning conditions */#define KERN_NOTICE "<5>" /* normal but significant condition */#define KERN_INFO "<6>" /* informational */#define KERN_DEBUG "<7>" /* debug-level messages */

Page 19: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-19

Debugging with Debugging with printkprintk #3 #3

Some level of control for printk output can be found in /proc/sys/kernel/printkLet’s look at the following output:

# cat /proc/sys/kernel/printk7 4 1 7

This indicates:•The console_loglevel is 7, so messages with importance of

0..6 will currently go to the console•The default message log level is 4, so messages that do not

specify an importance are treated as level 4•The minimum console log level is 1, so console_loglevel

cannot be set to any value less than 1•The default console log level is 7, so console_loglevel starts

out set to 7

Page 20: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-20

Debugging with Debugging with printkprintk #4 #4

You may control the console_loglevel by writing to /proc/sys/kernel/printkTo enable all printk messages with importance levels 0..7:

# echo 8 > /proc/sys/kernel/printk

If the kernel command line contains the word “debug”, the console_loglevel starts with a value of 10

Page 21: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-21

Using the Using the /proc/proc File System File System

Use /proc entries for driver instrumentationIt is possible to register “write” functions that allow us to dynamically modify values in the kernel (or drivers, or modules) Write values to the /proc file system entry

Read /proc entries allow us to retrieve information from a running kernel entityInformation is provided “live”

Look at the driver source, there may already be /proc entries that can help you

Page 22: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-22

Techniques for Source Techniques for Source DebuggingDebugging

The two primary ways to provide source debugging in the Linux kernel are based on either kgdb or on the use of a hardware JTAG probeUnfortunately, kgdb is not a standard feature of the kernel a/o 2.6.25.8You’ll have to patch your kernel to enable it

Either technique will require the use of a kernel image that is compiled with debugging symbolsUnless you like debugging in assembly language

Page 23: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-23

Compiling the Kernel with Debug Compiling the Kernel with Debug InfoInfo

This will increase the size of the debug kernel image by about 30%However, you don’t need to load the debug version of the kernelLoad the non-debug version to the target, but use

the debug version for the debugger/JTAG probe

Save off the vmlinux and System.map file because these are used by the debugger or by you to find key addressesThe (b)zImage can be loaded on the target as

normal

Page 24: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-24

Enabling Debugging in the Enabling Debugging in the KernelKernel

Page 25: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-25

Kernel gdb (kgdb)Kernel gdb (kgdb)

If you are using a stock kernel, kgdb is not includedLinus doesn’t believe in a source debugger

in the kernelMany commercial Linux vendors do include

it in their distributions thoughkgdb can be downloaded from:http://kgdb.linsyssoft.com/downloads.htmor http://sourceforge.net and look for kgdbYou’ll need to patch the kernel

A new kgdb light is in the works for 2.6.26Kgdb over the system console

Page 26: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-26

Enable kgdb in the KernelEnable kgdb in the Kernel

Page 27: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-27

Kgdb Light in 2.6.26-rc8Kgdb Light in 2.6.26-rc8

Uses system console for I/O

Page 28: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-28

kgdb Lash upkgdb Lash up

kgdb supports debugging via the serial portThe gdb debugger is running on a second machine

using the vmlinux you compiled with debugging symbols

You attach to the system being debugged using gdb’s “target remote” command

Host

Target

Network

Ethernet

RS-232

Page 29: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-29

Hardware-assisted DebuggingHardware-assisted Debugging

There are a number of devices that can help with debuggingLEDs, JTAGs, logic analyzers, oscilloscopes,

bus analyzers and more

These can range from a few cents to implement to several 10s of thousands of dollarsYou typically get what you pay for

Page 30: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-30

Debugging with LEDsDebugging with LEDsVery simple:Blink on/off in various code sections under debugBlink in sequencesCan display multiple-bit codes if multiple LEDs are

available

Very fast, little impact on run-time performanceAdding LED debug code will likely not “make the

problem go away”

These may be the only option for debugging early x86 code

Page 31: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-31

Debugging with LEDs, Debugging with LEDs, CaveatsCaveats

The LED(s) must be free for useNot tied in hardware to a network PHY

or to displaying power status, for example

LED are not very verboseYou must decipher what the blinking

meansCan be difficult to determine where

you are in the code

You can also attach an oscilloscope to the GPIO pins found on many processors for more information

Page 32: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-32

Hardware DebuggersHardware Debuggers

In the past, in-circuit emulators (ICE) where the debugger of choice You pulled the CPU, plugged the ICE in and plugged the CPU

into the ICE But, these where $80K+ each

Logic analyzers are also good to have But, they are $35K+ for an empty mainframe PC-based versions can be had for < $1K

IEEE 1149.1 (JTAG) has become the debugger da jour JTAG uses a boundary-scan protocol These range from $70 to $20K depending on model and

features At a minimum, a JTAG is really a “must-have” for firmware

and board bring up

Page 33: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-33

Debugging with a JTAG ProbeDebugging with a JTAG Probe

Debugging with a JTAG unit is much less involved than using kgdbCompile the kernel with debugging enabledNo need to patch the kernel for kgdbAssumes your platform supports a JTAG interface

You connect to the JTAG unit using whatever technique your JTAG probe requiresYour JTAG GUI is dependent on the vendorFor those JTAG units that are gdb-aware, use the

appropriate target remote commands

Page 34: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-34

Example JTAG UsageExample JTAG Usage

Connect the JTAG to the target and the hostStart the host application to control the JTAGReset the target and load the

register configuration settings to the JTAG unit

Load code and enjoy!Useful for debugging drivers

as well as bringing up new firmware and BSPs

Page 35: Troubleshooting Linux Kernel Modules And Device Drivers

08/05/2008 - Copyright 2008, The PTR Group, Inc.LinuxWorld-SF-35

SummarySummary

“Real developers” use printk – or at least Linus does Tools like strace allow you see the flow of executionThe /proc filesystem gives you a window into the kernel/driversKgdb uses familiar gdb technology but at the kernel levelLEDs are a fast and easy way to get info out of the machine as wellHardware JTAG debug tools may be availableThese can be invaluable if you can get one


Recommended