Linux Kernel Experiencces

If you are color blind try the Color chooser Color chooser Color chooser

Warning

First off let me say that I am by no means a kernal hacker. I am just documenting my experiences with the kernel, nothing more.

MSI K8T Neo kernel 2.6 Oops errors.

I bought the MSI K8T Neo motherboard with an AMD 64 3200 chip and whenever I put the filesystem under heavy load it would Oops. I noticed this when trying to restore a postgres database.

I searched high and low for a decent guide on trying to decipher the Oops output but could not really find much that I could use. After much rummaging around I found that I needed to save the Oops output to a file and then us ksymoops to get some info from it. This was a disaster because I got nothing but warning and errors about how untrustworthy my findings where. On further investigation I noticed that you could compile the kernel with the option

CONFIG_KALLSYMS=y

set in the config file what this means is that when the kernel decides to oops it produces the correct output that you would expect from ksymoops. This was nice to have and put my mind at rest because it still appeared to have errors so I wasn't going mad.

One thing to note is that after you have obtained an Oops then any further Oops cannot be trusted. This is because you are effectively running a broken system. It might not appear broken but it is, the only use an Oopsed system is good for is debugging.

How to read an Oops

I would love to find a tutorial on this because I spent ages looking for one and the ones I did find where a bit above my head. I did try to debug the problem I had but I was a bit shocked at the amount of stuff that you needed to know just to fart without following through when dealing with the linux kernel. The kernel is complicated, complicated, complicated and in case you didn't hear me the first time, complicated, 1600 Pennsylvania Avenue and its occupants can only dream of appearing this complicated. I have to say appearances can be deceiving, in this case 1600 P. Avenue is easy to understand, the kernel unfortunately isn't, or at least this has been experience of it (limited).

The fisrt thing you need to work with an oops is the oops itself. I am only going to talk oops that have had their symbols resolved via ksymoops or by using the

CONFIG_KALLSYMS=y

kernel option. I use the kernel otpion because I don't need to remember it and for someone like me thats a good thing.

Kernel Bugs

As soon as I got the bug I wanted to send it straight to the linux kernel list on the assumtion that someone there would have seen it before but heeding many warning and having seen scorch marked newbies before I refrained from this and decided to try and investigate a little before doing so. This was in part due to my wn curiosity, I also didn't want to make an arse of myself posting a bug that was solved in kernel 0.86.

First thing I looked for was some guidance on how to go about reading the oops output. I found various snippets of text here and there and from these I gathered that a normal oops output is not in human readable format ( when I say human I am disregarding kernel wizards, these people have yet to be classified / categorised ). Prodding deeper I discovered that I needed to take the oops file and use a tool to get it into a format that was readable or at least more readable. This tool is known as "ksymoops" and should be installed on your system. I did not have much luck using ksymoops because I got various warnings etc about how unreliable my oops was. On further investigation I noticed that I could compile a kernel to give me an oops in a readable format (please see notes above). I re-compiled my kernel with the the option set and set about reproducing the oops and lo and behold the oops was produced in a different unintelligable output or at least at first glance this is what it looked like. Below you can see the oops that I got.

            Unable to handle kernel paging request at virtual address 00ff0744
            printing eip:
            c01e5351
            *pde = 00000000
            Oops: 0000 [#1]
            CPU: 0
            0060:[generic_make_request+17/384] Not tainted
            EFLAGS: 00010282 (2.6.5)
            EIP is at generic_make_request+0x11/0x180
            eax: 00000202 ebx: 007d8008 ecx: 00ff0740 edx: e8eea300
            esi: fb001000 edi: e8eea300 ebp: 00000040 esp: f3ee5d70
            ds: 007b es: 007b ss: 0068
            Process kjournald (pid: 868, threadinfo=f3ee4000 task=f572b2a0)
            Stack: f3ee5d70 f3ee5d70 f3ee5da4 00000082 f1b9f8c0 e8eea300 00000000
            00000000
            00000010 c01495ab f7fed8a0 00000010 00000000 e908fbd0 00000001
            007d8008
            00000013 00000001 00000040 c01e54fd e8eea300 e908fbd0 c0148fb0
            00000001
            Call Trace:
            [bio_alloc+203/416] bio_alloc+0xcb/0x1a0
            [submit_bio+61/112] submit_bio+0x3d/0x70
            [ll_rw_block+96/128] ll_rw_block+0x60/0x80
            [journal_commit_transaction+3533/4048]
            journal_commit_transaction+0xdcd/0xfd0
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [kjournald+180/464] kjournald+0xb4/0x1d0
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [ret_from_fork+6/20] ret_from_fork+0x6/0x14
            [commit_timeout+0/16] commit_timeout+0x0/0x10
            [kjournald+0/464] kjournald+0x0/0x1d0
            [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14

            Code: 8b 41 04 c1 ee 09 8b 50 38 8b 40 34 0f ac d0 09 85 c0 89 c3

The oops above appears incomprehensible but it isn't. I knew absolutely nothing about them until today and I managed to track my problem down. I am not saying I know much more about them now but I now know that the line that is of the most interest in the oops above is:

            0060:[generic_make_request+17/384] Not tainted

Its this line that tells me where the oops occoured ie what function the scene of the crime took place in. The "Not tainted" bit at the end is a signal for kernel hackers to easily check if I have been loading in proprietary drivers or doing some odd non standard stuff to the kernel. If the kernel is tainted it has gone off the beaten track and gets exponentially harder to debug. Some kernel hackers won't help you if you kernel has been tainted.

The next job was to try and locate where the function was being called from. To do this I issued the following commands

        cd /usr/src/linux/
        grep -r "generic_make_request" *

This produced a lot of output but only one of the files will give a line simalar to

          drivers/block/ll_rw_blk.c:EXPORT_SYMBOL(generic_make_request);

This is where the function belongs ie where it was declared. It was at this point that I got a bit stuck. I know enough C to be dangerous but anyone who can code "Hello World" in C is dangerous. C is a firm supporter of the 2nd ammendment and your right to bear arms, how you use them is entirely up to you. C is just the arms dealer. I rummaged around in

        /usr/src/linux/drivers/block/ll_rw_blk.c

for a while until I realised that I was getting nowhere, even after having looked at various header files etc.

Repeating an Oops

I read somewhere that repeating an oops can somthimes be quite a good way of tracking it down. Even though I had had this problem several times I had not actually compared several oops to see if they where all simlar or occouring in the same function so this is what I did next. I rebooted the machine and re-created the oops. The next oops can be ssen below

            Unable to handle kernel paging request at virtual address 00650d50
            printing eip:
            c0161fb4
            *pde = 00000000
            Oops: 0000 [#1]
            CPU: 0
            EIP: 0060:[mpage_writepage+116/1344] Not tainted
            EFLAGS: 00010246 (2.6.5)
            EIP is at mpage_writepage+0x74/0x540
            eax: 2000102d ebx: 00000000 ecx: 0000000c edx: 00650d50
            esi: c10a9998 edi: 00650d50 ebp: f753e180 esp: c1ba9d10
            ds: 007b es: 007b ss: 0068
            Process pdflush (pid: 6, threadinfo=c1ba8000 task=c1bab700)
            Stack: eaf57800 c10a99c0 00001000 00000000 00000000 00000000 00000000
            00000000
            00000000 00000001 ec79e6c0 00000001 0000000c f753e20c ec79e6c0
            4c5a0d66
            0000e2c2 99f3269a 00000071 c1ba9d8c 00000082 00000001 c0112c3d
            00000000
            Call Trace:
            [scheduler_tick+109/1296] scheduler_tick+0x6d/0x510
            [schedule+740/1280] schedule+0x2e4/0x500
            [mpage_writepages+596/704] mpage_writepages+0x254/0x2c0
            [ext2_get_block+0/880] ext2_get_block+0x0/0x370
            [ext2_writepages+31/48] ext2_writepages+0x1f/0x30
            [ext2_get_block+0/880] ext2_get_block+0x0/0x370
            [do_writepages+30/64] do_writepages+0x1e/0x40
            [__sync_single_inode+169/480] __sync_single_inode+0xa9/0x1e0
            [sync_sb_inodes+331/496] sync_sb_inodes+0x14b/0x1f0
            [writeback_inodes+51/80] writeback_inodes+0x33/0x50
            [background_writeout+123/192] background_writeout+0x7b/0xc0
            [pdflush+0/48] pdflush+0x0/0x30
            [__pdflush+159/336] __pdflush+0x9f/0x150
            [pdflush+40/48] pdflush+0x28/0x30
            [background_writeout+0/192] background_writeout+0x0/0xc0
            [pdflush+0/48] pdflush+0x0/0x30
            [kthread+165/176] kthread+0xa5/0xb0
            [kthread+0/176] kthread+0x0/0xb0
            [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14

            Code: 8b 02 a8 04 0f 85 f2 02 00 00 8b 02 a8 10 0f 85 8c 02 00 00

I noticed a major difference in this and the last one. The oops was being gernerated in differnetn function calls. This was a bit odd and not what I expected at all. I would have expected the two oopsen to have appeared at least vaguely familiar but the didn't. This did not bode well for me because it suggest a spurious error that is normaly hardware related.

Reproducing the oops again gave me

            Unable to handle kernel paging request at virtual address 000e1b58
            printing eip:
            c0133e61
            *pde = 00000000
            Oops: 0002 [#1]
            CPU: 0
            EIP: 0060:[activate_page+49/128] Not tainted
            EFLAGS: 00010046 (2.6.5)
            EIP is at activate_page+0x31/0x80
            eax: c17f7a50 ebx: c10ebe60 ecx: c10ebe78 edx: 000e1b58
            esi: c0300fd8 edi: c10ebe60 ebp: d1883ae0 esp: d588fd68
            ds: 007b es: 007b ss: 0068
            Process postmaster (pid: 823, threadinfo=d588e000 task=df3cacc0)
            Stack: c10ebe60 00001000 c0133ed8 00000000 c012df8b d93a70c0 c10ebe60
            00000000
            00001000 d588fdf4 00000001 00000001 00000337 c62a5c40 c014a1fd
            c1b49600
            00000000 00000001 00001000 00000000 d588fdf4 00000000 d1883a54
            40a10d00
            Call Trace:
            [mark_page_accessed+40/48] mark_page_accessed+0x28/0x30
            [generic_file_aio_write_nolock+1099/2672]
            generic_file_aio_write_nolock+0x44b/0xa70
            [bio_hw_segments+45/48] bio_hw_segments+0x2d/0x30
            [scheduler_tick+31/1296] scheduler_tick+0x1f/0x510
            [buffered_rmqueue+191/352] buffered_rmqueue+0xbf/0x160
            [update_process_times+70/96] update_process_times+0x46/0x60
            [update_wall_time+11/64] update_wall_time+0xb/0x40
            [do_timer+223/240] do_timer+0xdf/0xf0
            [generic_file_aio_write+119/160] generic_file_aio_write+0x77/0xa0
            [ext3_file_write+68/192] ext3_file_write+0x44/0xc0
            [do_sync_write+139/192] do_sync_write+0x8b/0xc0
            [permission+70/80] permission+0x46/0x50
            [permission+70/80] permission+0x46/0x50
            [get_empty_filp+104/224] get_empty_filp+0x68/0xe0
            [update_process_times+70/96] update_process_times+0x46/0x60
            [dentry_open+282/432] dentry_open+0x11a/0x1b0
            [filp_open+98/112] filp_open+0x62/0x70
            [do_sync_write+0/192] do_sync_write+0x0/0xc0
            [vfs_write+184/304] vfs_write+0xb8/0x130
            [sys_write+66/112] sys_write+0x42/0x70
            [syscall_call+7/11] syscall_call+0x7/0xb

            Code: 89 02 c7 41 04 00 02 20 00 c7 43 18 00 01 10 00 ff 4e 2c 0f

This is another oops which appears completely different than the last one so I am starting to think that my problem is hardware related and not a kernel problem at all. It was at this point that I decided to send a bug report to the kernel mailing list. I can hear some of you say why send it if I knew it was hardware, the answer is that I didn't know it was hardware and that I was just guessing based on prior experience of electronics. The bug report is at the bottom of the page.

I got a couple of replies from people direct one of which suggested using memtest86+ to test my memory. I had already downloaded it so decided to give it a whirl. To start memtest86 you need to configure lilo to boot it rather than the kernel. The following lines are what I added to /etc/lilo.conf

           image=/boot/memtest86+.bin
           label=memtest86

remeber to change the "default=label_name" entry to use the new image. Do not reboot the machine unless you have a working resue disk or a working boot disk otherwise you won't be able to get back into the machine.

Memtest86 errors

Please be aware that because memtest finds errors it does not necessarily mean you have dodgy RAM. You could have a bad motherboard or chipset or they may not be seated correctly. When I ran memtest86 I got errors during tests 5 and 6. This was dissapointing because hardware bugs are a bit more terminal than software bugs and usually involves spending more money or wrangling with you supplier to get the parts replaced. I ran the memtest a few times and I was getting errros during the same tests without fail so I resigned my self to the fact that I my memory was dodgy. This is when I noticed something quite obvious that I should have noticed before. The RAM speed was showing DDR333. This rung alarm bells with me, I had bought the RAM quite a while ago and was pretty sure that it was slow stuff ie DDR266 or something similar. I reboooted the machine and found the DDR timings which where set to "Auto". I reduced the timing to DDR300 and reran memtest and lo and behold no more errors.

Below is a bug report I sent to the linux kernel mailing list.

            I have not submitted a bug report before so I hope this is enough
            information. If any more is required please let me know.

            [1] Getting Oops's during heavy filesystem access

            [2] I initially thought this was a hardware problem because I was
            trying to use the SATA on a MSI K8T Neo motherboard. I switched to
            using normal IDE disks and got another Oops using the 2.6.5 kernel. I
            reverted back to the old binary 2.2.20 kernel and tried to reproduce
            the problem but was unable to.

            [3] IDE SATA


            [4.] kernel 2.6.5

            [5.] I have had three seperate Oops all of which look completely
            different, at least to me. I have only included the first Oops from
            each occurence.


            Unable to handle kernel paging request at virtual address 00ff0744
            printing eip:
            c01e5351
            *pde = 00000000
            Oops: 0000 [#1]
            CPU: 0
            0060:[generic_make_request+17/384] Not tainted
            EFLAGS: 00010282 (2.6.5)
            EIP is at generic_make_request+0x11/0x180
            eax: 00000202 ebx: 007d8008 ecx: 00ff0740 edx: e8eea300
            esi: fb001000 edi: e8eea300 ebp: 00000040 esp: f3ee5d70
            ds: 007b es: 007b ss: 0068
            Process kjournald (pid: 868, threadinfo=f3ee4000 task=f572b2a0)
            Stack: f3ee5d70 f3ee5d70 f3ee5da4 00000082 f1b9f8c0 e8eea300 00000000
            00000000
            00000010 c01495ab f7fed8a0 00000010 00000000 e908fbd0 00000001
            007d8008
            00000013 00000001 00000040 c01e54fd e8eea300 e908fbd0 c0148fb0
            00000001
            Call Trace:
            [bio_alloc+203/416] bio_alloc+0xcb/0x1a0
            [submit_bio+61/112] submit_bio+0x3d/0x70
            [ll_rw_block+96/128] ll_rw_block+0x60/0x80
            [journal_commit_transaction+3533/4048]
            journal_commit_transaction+0xdcd/0xfd0
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [kjournald+180/464] kjournald+0xb4/0x1d0
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
            [ret_from_fork+6/20] ret_from_fork+0x6/0x14
            [commit_timeout+0/16] commit_timeout+0x0/0x10
            [kjournald+0/464] kjournald+0x0/0x1d0
            [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14

            Code: 8b 41 04 c1 ee 09 8b 50 38 8b 40 34 0f ac d0 09 85 c0 89 c3


            Unable to handle kernel paging request at virtual address 00650d50
            printing eip:
            c0161fb4
            *pde = 00000000
            Oops: 0000 [#1]
            CPU: 0
            EIP: 0060:[mpage_writepage+116/1344] Not tainted
            EFLAGS: 00010246 (2.6.5)
            EIP is at mpage_writepage+0x74/0x540
            eax: 2000102d ebx: 00000000 ecx: 0000000c edx: 00650d50
            esi: c10a9998 edi: 00650d50 ebp: f753e180 esp: c1ba9d10
            ds: 007b es: 007b ss: 0068
            Process pdflush (pid: 6, threadinfo=c1ba8000 task=c1bab700)
            Stack: eaf57800 c10a99c0 00001000 00000000 00000000 00000000 00000000
            00000000
            00000000 00000001 ec79e6c0 00000001 0000000c f753e20c ec79e6c0
            4c5a0d66
            0000e2c2 99f3269a 00000071 c1ba9d8c 00000082 00000001 c0112c3d
            00000000
            Call Trace:
            [scheduler_tick+109/1296] scheduler_tick+0x6d/0x510
            [schedule+740/1280] schedule+0x2e4/0x500
            [mpage_writepages+596/704] mpage_writepages+0x254/0x2c0
            [ext2_get_block+0/880] ext2_get_block+0x0/0x370
            [ext2_writepages+31/48] ext2_writepages+0x1f/0x30
            [ext2_get_block+0/880] ext2_get_block+0x0/0x370
            [do_writepages+30/64] do_writepages+0x1e/0x40
            [__sync_single_inode+169/480] __sync_single_inode+0xa9/0x1e0
            [sync_sb_inodes+331/496] sync_sb_inodes+0x14b/0x1f0
            [writeback_inodes+51/80] writeback_inodes+0x33/0x50
            [background_writeout+123/192] background_writeout+0x7b/0xc0
            [pdflush+0/48] pdflush+0x0/0x30
            [__pdflush+159/336] __pdflush+0x9f/0x150
            [pdflush+40/48] pdflush+0x28/0x30
            [background_writeout+0/192] background_writeout+0x0/0xc0
            [pdflush+0/48] pdflush+0x0/0x30
            [kthread+165/176] kthread+0xa5/0xb0
            [kthread+0/176] kthread+0x0/0xb0
            [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14

            Code: 8b 02 a8 04 0f 85 f2 02 00 00 8b 02 a8 10 0f 85 8c 02 00 00

            Unable to handle kernel paging request at virtual address 000e1b58
            printing eip:
            c0133e61
            *pde = 00000000
            Oops: 0002 [#1]
            CPU: 0
            EIP: 0060:[activate_page+49/128] Not tainted
            EFLAGS: 00010046 (2.6.5)
            EIP is at activate_page+0x31/0x80
            eax: c17f7a50 ebx: c10ebe60 ecx: c10ebe78 edx: 000e1b58
            esi: c0300fd8 edi: c10ebe60 ebp: d1883ae0 esp: d588fd68
            ds: 007b es: 007b ss: 0068
            Process postmaster (pid: 823, threadinfo=d588e000 task=df3cacc0)
            Stack: c10ebe60 00001000 c0133ed8 00000000 c012df8b d93a70c0 c10ebe60
            00000000
            00001000 d588fdf4 00000001 00000001 00000337 c62a5c40 c014a1fd
            c1b49600
            00000000 00000001 00001000 00000000 d588fdf4 00000000 d1883a54
            40a10d00
            Call Trace:
            [mark_page_accessed+40/48] mark_page_accessed+0x28/0x30
            [generic_file_aio_write_nolock+1099/2672]
            generic_file_aio_write_nolock+0x44b/0xa70
            [bio_hw_segments+45/48] bio_hw_segments+0x2d/0x30
            [scheduler_tick+31/1296] scheduler_tick+0x1f/0x510
            [buffered_rmqueue+191/352] buffered_rmqueue+0xbf/0x160
            [update_process_times+70/96] update_process_times+0x46/0x60
            [update_wall_time+11/64] update_wall_time+0xb/0x40
            [do_timer+223/240] do_timer+0xdf/0xf0
            [generic_file_aio_write+119/160] generic_file_aio_write+0x77/0xa0
            [ext3_file_write+68/192] ext3_file_write+0x44/0xc0
            [do_sync_write+139/192] do_sync_write+0x8b/0xc0
            [permission+70/80] permission+0x46/0x50
            [permission+70/80] permission+0x46/0x50
            [get_empty_filp+104/224] get_empty_filp+0x68/0xe0
            [update_process_times+70/96] update_process_times+0x46/0x60
            [dentry_open+282/432] dentry_open+0x11a/0x1b0
            [filp_open+98/112] filp_open+0x62/0x70
            [do_sync_write+0/192] do_sync_write+0x0/0xc0
            [vfs_write+184/304] vfs_write+0xb8/0x130
            [sys_write+66/112] sys_write+0x42/0x70
            [syscall_call+7/11] syscall_call+0x7/0xb

            Code: 89 02 c7 41 04 00 02 20 00 c7 43 18 00 01 10 00 ff 4e 2c 0f



            [6] I found the problem while restoring a database

            cat database.gz | gunzip | psql dbname

            where database.gz is a 600Mb file

            [7] Debian sarge ( mild and sunny ;-)


            [7.1]
            debian:~# /usr/src/kernel-source-2.6.5/scripts/ver_linux
            If some fields are empty or look unusual you may have an old version.
            Compare to the current minimal requirements in Documentation/Changes.

            Linux debian 2.6.5 #1 Sun Apr 25 19:53:20 BST 2004 i686 GNU/Linux

            Gnu C 3.3.3
            Gnu make 3.80
            binutils 2.14.90.0.7
            util-linux 2.12
            mount 2.12
            module-init-tools 3.0-pre10
            e2fsprogs 1.35
            pcmcia-cs 3.2.5
            PPP 2.4.2
            Linux C Library 2.3.2
            Dynamic linker (ldd) 2.3.2
            Procps 3.2.0
            Net-tools 1.60
            Console-tools 0.2.3
            Sh-utils 5.0.91
            Modules Loaded tulip crc32 af_packet


            [7.2.]
            debian:~# cat /proc/cpuinfo
            processor : 0
            vendor_id : AuthenticAMD
            cpu family : 15
            model : 4
            model name : AMD Athlon(tm) 64 Processor 3200+
            stepping : 8
            cpu MHz : 2001.027
            cache size : 1024 KB
            fdiv_bug : no
            hlt_bug : no
            f00f_bug : no
            coma_bug : no
            fpu : yes
            fpu_exception : yes
            cpuid level : 1
            wp : yes
            flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
            mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall mmxext lm 3dnowext
            3dnow
            bogomips : 3940.35

            [7.3.]
            debian:~# cat /proc/modules
            tulip 36640 0 - Live 0xf88bd000
            crc32 3840 1 tulip, Live 0xf88a8000
            af_packet 12552 2 - Live 0xf88aa000

            [7.4.]
            debian:~# cat /proc/ioports
            0000-001f : dma1
            0020-0021 : pic1
            0040-005f : timer
            0060-006f : keyboard
            0080-008f : dma page reg
            00a0-00a1 : pic2
            00c0-00df : dma2
            00f0-00ff : fpu
            0170-0177 : ide1
            01f0-01f7 : ide0
            0376-0376 : ide1
            03c0-03df : vga+
            03f6-03f6 : ide0
            0cf8-0cff : PCI conf1
            bc00-bcff : 0000:00:11.5
            c000-c0ff : 0000:00:0f.0
            c400-c40f : 0000:00:0f.0
            c400-c407 : ide2
            c408-c40f : ide3
            c800-c803 : 0000:00:0f.0
            c802-c802 : ide3
            cc00-cc07 : 0000:00:0f.0
            cc00-cc07 : ide3
            d000-d003 : 0000:00:0f.0
            d400-d407 : 0000:00:0f.0
            d800-d87f : 0000:00:0e.0
            dc00-dcff : 0000:00:0b.0
            e000-e0ff : 0000:00:07.0
            e000-e0ff : tulip
            e400-e47f : 0000:00:0d.0
            e400-e47f : sata_promise
            e800-e80f : 0000:00:0d.0
            e800-e80f : sata_promise
            ec00-ec3f : 0000:00:0d.0
            ec00-ec3f : sata_promise
            fc00-fc0f : 0000:00:0f.1
            fc00-fc07 : ide0
            fc08-fc0f : ide1

            debian:~# cat /proc/iomem
            00000000-0009fbff : System RAM
            0009fc00-0009ffff : reserved
            000a0000-000bffff : Video RAM area
            000cc800-000cd7ff : Extension ROM
            000e0000-000effff : Extension ROM
            000f0000-000fffff : System ROM
            00100000-3ffeffff : System RAM
            00100000-002b24f6 : Kernel code
            002b24f7-0033d13f : Kernel data
            3fff0000-3fff7fff : ACPI Tables
            3fff8000-3fffffff : ACPI Non-volatile Storage
            bdc00000-cdbfffff : PCI Bus #01
            c0000000-c7ffffff : 0000:01:00.0
            cdd00000-cfdfffff : PCI Bus #01
            ce000000-ceffffff : 0000:01:00.0
            cff60000-cff7ffff : 0000:00:0d.0
            cff60000-cff7ffff : sata_promise
            cfffe000-cfffefff : 0000:00:0d.0
            cfffe000-cfffefff : sata_promise
            cffff000-cffff7ff : 0000:00:0e.0
            cffffe00-cffffeff : 0000:00:0b.0
            cfffff00-cfffffff : 0000:00:07.0
            cfffff00-cfffffff : tulip
            d0000000-d1ffffff : 0000:00:00.0
            fec00000-fec00fff : reserved
            fee00000-fee00fff : reserved
            fff80000-ffffffff : reserved


            [7.5.]
            0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8385 [K8T800 AGP]
            Host Bridge (rev 01)
            Subsystem: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge
            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
            Stepping- SERR- FastB2B-
            Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
            SERR- 
            Capabilities: [c0] #08 [0060]
            Capabilities: [68] Power Management version 2
            Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
            PME(D0-,D1-,D2-,D3hot-,D3cold-)
            Status: D0 PME-Enable- DSel=0 DScale=0 PME-
            Capabilities: [58] #08 [8001]

            0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge
            [K8T800 South] (prog-if 00 [Normal decode])
            Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
            Stepping- SERR+ FastB2B-
            Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
            SERR- Reset- FastB2B-
            Capabilities: [80] Power Management version 2
            Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
            PME(D0-,D1-,D2-,D3hot-,D3cold-)
            Status: D0 PME-Enable- DSel=0 DScale=0 PME-

            0000:00:07.0 Ethernet controller: Lite-On Communications Inc LNE100TX
            (rev 20)
            Subsystem: Netgear FA310TX
            Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
            Stepping- SERR- FastB2B-
            Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- TAbort-
            SERR- 

            [7.6.]
            debian:~# cat /proc/scsi/scsi
            Attached devices:



            [X.] Since I am unable to reproduce the problem with the old binary
            2.2.20 on normal IDE disks but I can on the same disks when using the
            2.6.5 compiled kernel I am making the wild assumption that it is not
            hardware related. I tried to find rougly where the problem was using

            objdump -d
            /mnt/hdc2/usr/src/kernel-source-2.6.4/drivers/block/ll_rw_blk.o
            objdump -d /usr/src/linux/fs/bio.o
            objdump -d fs/mpage.o

            and trying to use the offsets from the oops to see where the problem
            was but I was unable to locate the offset in each of the files. This is
            probably more my inexperience than anything else. If there is a decent
            tutorial on how to do this sort of thing I would appreciate a pointer
            or two. So far the only thing I can think of is that my compiler is
            dodgy or I am having spurious memory problems.

            yours
            Harry

The more observant among you will have noticed that it is fairly lengthy. This is because I followed the kernel bug reporting procedure and tried to provide as much information as possible. If you want a reply you would do well to follow the guidlines and help the kernel hackers as much as you can, they are busy people.