Re: [GIT PATCH] PCI patches for 2.6.15 - retry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 13/01/2006 9:59 a.m., Jeff Garzik wrote:
Alan Cox wrote:
On Iau, 2006-01-12 at 16:55 +1300, Reuben Farrelly wrote:

ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
Unable to handle kernel NULL pointer dereference at virtual address 00000000


That is the critical bit. The SATA ports have no PCI resources assigned
for bus mastering (BAR 4). libata should have driven the device PIO in
this case but the resource should have been assigned.

Agreed.  This appears to be BIOS assigning bad values to SATA hardware.

However, libata should recognize this and not attempt to iomap or drive the hardware, in that case, rather than oops.

    Jeff

Some testing tonight has shown up a bit more about where this regression crept in.

Below the table reads release on left hand side, and the result of a given reboot on the right hand side after the dash. I had to do so many reboots to be sure that the bug was there or not - as you can see from the -mm1 test it doesn't always show up.

2.6.15 - OK OK OK OK OK
2.6.15-git1 - OK OK OK OK OK OK OK OK
2.6.15-git2 - OK
2.6.15-git6 - OK OK OK OK OK OK OK OK
2.6.15-git12 - OK OK OK OK OK OK OK

2.6.15-rc5-mm3 - OK OK OK(but oopsed in usb) OK OK(but oopsed in usb)
   Those oopses in USB were only seen in this release so looks likely whatever
   was causing them was fixed soon after.
2.6.15-mm1 - OK OK OOPSED OOPSED OOPSED all in ATA
2.6.15-mm2 + mm3 - [known to OOPS on this bug frequently but not tested in this round]
2.6.15-mm4 - OOPSED OK OOPSED TIMEOUT TIMEOUT OOPS OK
2.6.15-mm1 with no git-acpi.patch - TIMEOUT TIMEOUT OOPSED TIMEOUT OK

OK means the system booted up to single user mode without issue, TIMEOUT means that the controllers were assigned IRQ 50 and then failed to find any ATA disks and OOPSED means that he SATA ports were not assigned IRQs at all and hence the system oopsed out like this:

ahci: probe of 0000:00:1f.2 failed with error -12
ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c023c873
*pde = 00000000
Oops: 0000 [#1]
<plus a trace and a whole lot more>

Full output on http://www.reub.net/files/kernel/outstanding-kernel-bugs.txt (as
usual)

In summary the good news is that 2.6.15-git12 (which is the latest linus tree)
is GOOD and does not seem to exhibit this problem. Not a single -git release crapped out. It seems some regression was introduced into 2.6.15-mm1 which has been carried forward through to -mm4 so far though but never pushed to Linus. I guess it also suggests that it's not a hardware or bios issue given the sheer number of successful reboots in a row, and I think reverting the git-acpi.patch suggests that ACPI is not the cause of it, at least in this instance. But that's about as far as I have gotten.

45 reboots later I'm finishing for tonight, but before I go back and hit it with
git bisect to narrow it down any further, Andrew/Jeff does this make it any easier to pinpoint, and/or do you have any preliminary patches to test or ideas as to what other subsystems could be involved?

Thanks,
Reuben



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux