Re: bad pmd filemap.c, oops; 2.4.30 and 2.4.32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chris,

On Fri, Jan 06, 2006 at 01:54:45PM -0800, Chris Stromsoe wrote:
> On Thu, 5 Jan 2006, Willy Tarreau wrote:
> >On Wed, Jan 04, 2006 at 07:52:36PM -0800, Chris Stromsoe wrote:
> >
> >>I booted 2.4.32 with the aic7xxx patch you pointed me at last week. 
> >>It's been up for a few hours.  I'll let it run for at least a week or 
> >>two and will report back positive or negative results.  After that, 
> >>I'll try 2.4.32 with nosmp and acpi=off.
> >
> >Thanks for your continued feedback, Chris. Your reports are very 
> >helpful, they tend to prove that your hardware is OK and that there's a 
> >bug in mainline 2.4.32 with SMP+ACPI+aic7xxx enabled. That's already a 
> >good piece of information.
> 
> After a little more than one day up with 2.4.32 SMP+ACP+aic7xxx, I got 
> another bad pmd and an oops this morning at 4:23am.  I'm going to boot 
> vanilla 2.4.32 with nosmp and acpi=off.

Well, I'm puzzled. On the one hand, your oopses don't all look the
same, so we could think it's a hardware problem. On the other hand,
your hardware tests did not find anything and 2.6.14 runs fine. BTW,
I also have other machines running in production with an adaptec
29160 like yours and I don't encounter this. It looks like some
memory corruption, but finding what causes it seems very hard. In
fact, it somewhat reminds me the problems encountered by Stephan
von Krawczynski 2.5 years ago. He encountered data corruption when
saving large amounts of data to a DLT connected to an AIC7xxx, and
often had freezes, and sometimes an oops. IIRC, changing the board
for something else fixed his problem.

I've compared the driver between 2.4 and 2.6, and the core has not
changed much, but its interface to the OS has changed a lot, so it's
not easy to identify a potential fix.

Eventhough I don't like this, I would join Roberto's advice to
upgrade to 2.6 and stick to it. If you finally encounter the
same problem on 2.6 after a very long time, then it would be
an indication that the problem is well in your hardware.

> -Chris

Thanks for all your investigations,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux