On 13/01/2006 9:59 a.m., Jeff Garzik wrote:
Alan Cox wrote:
On Iau, 2006-01-12 at 16:55 +1300, Reuben Farrelly wrote:
ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
Unable to handle kernel NULL pointer dereference at virtual address
00000000
That is the critical bit. The SATA ports have no PCI resources assigned
for bus mastering (BAR 4). libata should have driven the device PIO in
this case but the resource should have been assigned.
Agreed. This appears to be BIOS assigning bad values to SATA hardware.
However, libata should recognize this and not attempt to iomap or drive
the hardware, in that case, rather than oops.
Jeff
Some testing tonight has shown up a bit more about where this regression crept in.
Below the table reads release on left hand side, and the result of a given
reboot on the right hand side after the dash. I had to do so many reboots to
be sure that the bug was there or not - as you can see from the -mm1 test it
doesn't always show up.
2.6.15 - OK OK OK OK OK
2.6.15-git1 - OK OK OK OK OK OK OK OK
2.6.15-git2 - OK
2.6.15-git6 - OK OK OK OK OK OK OK OK
2.6.15-git12 - OK OK OK OK OK OK OK
2.6.15-rc5-mm3 - OK OK OK(but oopsed in usb) OK OK(but oopsed in usb)
Those oopses in USB were only seen in this release so looks likely whatever
was causing them was fixed soon after.
2.6.15-mm1 - OK OK OOPSED OOPSED OOPSED all in ATA
2.6.15-mm2 + mm3 - [known to OOPS on this bug frequently but not tested in this
round]
2.6.15-mm4 - OOPSED OK OOPSED TIMEOUT TIMEOUT OOPS OK
2.6.15-mm1 with no git-acpi.patch - TIMEOUT TIMEOUT OOPSED TIMEOUT OK
OK means the system booted up to single user mode without issue, TIMEOUT means
that the controllers were assigned IRQ 50 and then failed to find any ATA disks
and OOPSED means that he SATA ports were not assigned IRQs at all and hence the
system oopsed out like this:
ahci: probe of 0000:00:1f.2 failed with error -12
ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c023c873
*pde = 00000000
Oops: 0000 [#1]
<plus a trace and a whole lot more>
Full output on http://www.reub.net/files/kernel/outstanding-kernel-bugs.txt (as
usual)
In summary the good news is that 2.6.15-git12 (which is the latest linus tree)
is GOOD and does not seem to exhibit this problem. Not a single -git release
crapped out. It seems some regression was introduced into 2.6.15-mm1 which has
been carried forward through to -mm4 so far though but never pushed to Linus.
I guess it also suggests that it's not a hardware or bios issue given the sheer
number of successful reboots in a row, and I think reverting the git-acpi.patch
suggests that ACPI is not the cause of it, at least in this instance. But
that's about as far as I have gotten.
45 reboots later I'm finishing for tonight, but before I go back and hit it with
git bisect to narrow it down any further, Andrew/Jeff does this make it any
easier to pinpoint, and/or do you have any preliminary patches to test or ideas
as to what other subsystems could be involved?
Thanks,
Reuben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]