Torsten Kaiser wrote:
On 9/27/07, Tejun Heo <[email protected]> wrote:
Tejun Heo wrote:
Torsten Kaiser wrote:
Comparing the driver/ata directory from rc3-mm1 and rc4-mm1 the
following change looked the most suspicions to me:
http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=blobdiff;f=drivers/ata/sata_sil24.c;h=3dcb223117be9739ee04d70b6bfc776a4b839a3f;hp=e0cd31aa8002350add53ba6ff07493e503275244;hb=020bc1bd8d369a77bd9379cd9763ac0057651753;hpb=8d4bdf8087e682df98bdb856f6ad451bf6d597e7
That after rc4-mm1 the sata_sil24.c did not change anymore also
matches the occurrence of the error.
To confirm my theorie I exchanged the sata_sil24.c from rc8-mm1 with
the version from rc3-mm1.
I was able to boot the resulting kernel successfully 5 times, without
the error happening again.
Thanks a lot for chasing down the problem. The changed code is address
initialization path and it's weird that it causes intermittent failures,
not a consistent one.
Anyways, does the attached patch fix the problem?
I'm starting to *really* hate that bug.
My analysis was wrong, as I booted to modified 2.6.23-rc8-mm1 this
morning, that failed too. (Same error messages as -rc7-mm1 from the
first mail in this thread.)
So it's not that change that causes the breakage.
And I'm not really finding a good pattern to what boots fail and what work.
It seems to only fail, if I completely power off the system for
several hours. (Using the physical switch at the backside of the
powersupply, not the normal soft-off)
One of the five boots I tried yesterday, I also powered the system
completely off that way, but only leaving it off ~10..20 seconds
seemed not to trigger the bug.
But I still think that is not a hardware failure, as the -rc3-mm1
kernel never showed that error, even when I used it several times
after the first -rc4-mm1 failures.
If not, can you add printk of iomap[SIL24_PORT_BAR], offset, initialized
cmd_addr and scr_addr in the loop and see whether anything is different
between when the driver works and fails.
Should I do this anyway?
I compared the dmesg form good and bad boots with -rc7-mm1 but could
not see any difference, so do you think that these additional
diagnostics could show a difference?
Or could you suggest any other debugging options I should try?
I think since its a reproducible problem, I think it's easiest to get
you straight to git-bisect. In this case, that would be
1. Clone branch "upstream" of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
2. Test. If bug persists, you have narrowed down the problem to the -mm
changes from the SATA developers, that are to be sent for 2.6.24. If
the problem does not persist, then it's a problem added in the -mm
patchset alone, which carries few ATA patches outside of libata-dev.git.
3. If the problem is in libata-dev.git#upstream (likely), you can now
use git-bisect to find the specific commit that causes the problems.
Read the git-bisect man page for full details, but the basics are
a) start with a known good point (v2.6.22? v2.6.23?) and
known bad point (HEAD, aka the most recent commit in
libata-dev.git#upstream)
b) build and boot kernels, marking each as known-good or
known-bad.
c) This process will systematically narrow down the problem
to a single git commit.
Regards,
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]