Gilbert Sebenste wrote:
Hello all,
I am having an absolutely vexing problem that maybe somebody might shed
some light on.
I just got 2 new computers, both running F7. They each have one Seagate
750 GB SATA 3 Gb/s, 7200 RPM, 16 MB drive. Each machine has 4 GB of
RAM, Core 2 quad 6700 motherboard from ASUS.
OK. I run the computers pretty hard. But I have two Pentium 4's who
work just as hard, all getting a 20 MB/sec peak (1 MB/sec avg) weather
feed from the National Weather Service, flawlessly for months until I
install new kernels on it and reboot.
OK, within 12 hours after startup of the new machine running identical
software that the other slower machines are running with the exact same
data feed, I get
kernel: journal commit I/O error
I can log in, but can't do commands. A manual power-down (shutdown -r
now won't work) and reboot clears it fine.
First I suspected a hard drive error on both machines. But then
replacement hard drives came in. It seemed to stop the problem for a
few days, so I closed a bugzilla I had. Nope, this weekend, it went
back to crashing every 4-18 hours.
I tried to cut the read-writes in half, to no effect, by reducing the
amount of data/files coming in.
I have:
Replaced the hard drive 3 times with new ones (to no avail)
Reduced the read/writes by around half
Turned off legacy USB support, which also caused my keyboard and mouse
to stop working with errors (that's been cleared and is OK)
Filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=318661
Tonight, I tried using the original kernel that came with F7
(2.6.21-1.3194.fc7) instead of the latest (2.6.22.9-91.fc-7).
As of two hours into this, so far so good, but I'm not confident.
Two other machines, Pentium 4's at 3 GHZ with ASUS motherboards, purr
like a kitten.
Has anyone seen anything like this, or know what could be the problem?
As always, grateful for any help, and thanks for reading this!
Gilbert
*******************************************************************************
Gilbert Sebenste
********
(My opinions only!)
******
*******************************************************************************
I would suspect a hardware issue with the motherboards as my first port
of call. I have had a similar problsm with a new Pentium 4 board
recently where the ATA disc interface offlined every 18 hours of so but
hvaing replaced with a SATA drive the system purrs for weeks.
Secondly the kernel version may be important - core 2 quad processors
are newish so later kernel SHOULD have better support. Maybe try a
development kernel on one of the machines e.g. 2.6.23.-----
Finally, have you run a full FSCK on the drives after they fail -
reboot into single mode and run fsck -f. You may find that the problem
is a disc structure corruption ... then you have to find out why.
You do not say which journalling file system you are using - is this
ext3, jfs, reiserfs, ...
Finally, have you run memtest86+ on these machines - possible memory
dropout going unnoticed (especially if they do not have ECC memory)
Note sure if this will help but hope it is not just noise....
--
Howard Wilkinson
|
Phone:
|
+44(20)76907075
|
Coherent Technology Limited
|
Fax:
|
|
23 Northampton Square,
|
Mobile:
|
+44(7980)639379
|
United Kingdom, EC1V 0HL
|
Email:
|
howard@xxxxxxxxxxx
|
|