Wednesday, November 15, 2006

debugging the linux kernel

I'm taking baby steps today, debugging the kernel.
This seems a very useful resource.

I tested the com ports between the 2 machines using the instructions there and it works great.

A tiny missing step that would be useful to a newbie like me unfamiliar with patching the kernel:

Under "Applying the kgdb patch", you need to first go into the linux directory before executing the patch command. So between 1 and 2, there has to be a step:

cd {$BASEDIR}/linux-2.6.7

Also the path to the patches is missing a sub-dir. The patches are all unzipped into {$BASE_DIR}/patch-kgdb/linux-2.6.7-kgdb-2.2/ so the patch commands from 2 to 7 should read this way:

patch -p1 < ${BASE_DIR}/patch-kgdb/linux-2.6.7-kgdb-2.2/core-lite.patch

Snag 2:

I realized I needed the qt-devel package to do 'make xconfig' - this is fully graphical. So I decided to do make menuconfig (which I've done before, and which uses a text based menu). Then I ran into this compiler error. By following the simple patch (remove static from declaration of 'current_menu'), I could proceed with the build. Now I had the menu where I could make the kernel selections - progress!

Snag 3:

I realized that 2.6 has been compiled with GCC 3 and does not compile with GCC 4 (which is what most newer machines have). I decided to be macho and 'fix' the kernel code to get the compiler to sing. Still going strong after 1 day...
This is a good article on the GCC 4.0 changes that affect Linux 2.6.7 as well.

Snag 4:

I managed to fix all the compiler / linker errors (they were due to the stricter way GCC 4.0 treated inline and static keywords). I transfered the image over to the test machine, changed grub, but it wouldn't boot -problem mounting the file system. Upon investigating I found that linux needs the initrd image to boot (the initrd image loads the drivers in RAM). But to build the initrd image, you need to install the modules as well. (mkinitrd that you use to make the initrd image needs the kernel version and it looks inside /lib/modules/ for the drivers). Now since I didn't have the modules installed on the test machine for this version of the kernel, I just specified a valid kernel that was on the box to mkinitrd, and it successfully built me an initrd image. Now I moved it over to /boot, edited grub to note that and rebooted. No joy! It was still failing to mount. Over the weekend, I was chatting with a good friend who is now hacking away in the Linux kernel (he was previously an engineer in the Windows kernel) and he said that this is probably because the driver versions are being checked by Linux. So I decided to install the modules on the dev machine, make a correct initrd image and copy it over to the test machine. This time it actually booted!

Here is where you can read how to do a regular kernel compilation (for the modules install part)

Snag 5:

SEGSEV in the kernel! Well I was happy. We came this far and here's my chance to look at some kernel code, see what's going wrong.

Here's the immediate output from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1]
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0xc03051db in psmouse_interrupt (serio=0xc048cde0, data=250 '\uffff', flags=0,
regs=0x0) at drivers/input/mouse/psmouse-base.c:206


And here is line 206 from psmouse-base.c:

rc = psmouse->protocol_handler(psmouse, regs);

So it seems that somehow the protocol handler for the mouse is not set. This is a USB mouse, so perhaps I forgot to set an option in make menuconfig...

No comments: