Re: F7: Trying to figure out why kernel crashes with journal commit I/O error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 8 Oct 2007, Gilbert Sebenste wrote:

Hello all,

I am having an absolutely vexing problem that maybe somebody might shed some light on.

I just got 2 new computers, both running F7. They each have one Seagate 750 GB SATA 3 Gb/s, 7200 RPM, 16 MB drive. Each machine has 4 GB of RAM, Core 2 quad 6700 motherboard from ASUS.

OK. I run the computers pretty hard. But I have two Pentium 4's who work just as hard, all getting a 20 MB/sec peak (1 MB/sec avg) weather feed from the National Weather Service, flawlessly for months until I install new kernels on it and reboot.

The P4 has been around for years, so that type of system has been pretty well tested.

OK, within 12 hours after startup of the new machine running identical software that the other slower machines are running with the exact same data feed, I get

kernel: journal commit I/O error

I can log in, but can't do commands. A manual power-down (shutdown -r now won't work) and reboot clears it fine.

First I suspected a hard drive error on both machines. But then
replacement hard drives came in. It seemed to stop the problem for a few days, so I closed a bugzilla I had. Nope, this weekend, it went back to crashing every 4-18 hours.

I tried to cut the read-writes in half, to no effect, by reducing the
amount of data/files coming in.

I have:

Replaced the hard drive 3 times with new ones (to no avail)

Reduced the read/writes by around half

Turned off legacy USB support, which also caused my keyboard and mouse to stop working with errors (that's been cleared and is OK)

Filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=318661

Tonight, I tried using the original kernel that came with F7
(2.6.21-1.3194.fc7) instead of the latest (2.6.22.9-91.fc-7).
As of two hours into this, so far so good, but I'm not confident.

Two other machines, Pentium 4's at 3 GHZ with ASUS motherboards, purr like a kitten.

Has anyone seen anything like this, or know what could be the problem?

As always, grateful for any help, and thanks for reading this!

Don't assume the problem is related to your heavy disk I/O. Try some other workloads. I like to run a suite of benchmarks on new hardware.
They often reveal problems with the initial setup, and are helpful
later on when something seems broken, e.g., why did the last kernel
update cause disk I/O to slow by 50%?

Are you using x86_64 kernels? I suspect most people with similar workloads will be using x86_64, so you may be encountering problems specific code that hasn't been thoroughly exercises on i386 kernels. In the past, there have been problems with RH's 4k stack size, particularly during error handling, that can mask the real source of the problem.
If you are really stuck with 32-bit kernels, you might try the 16k
versions from linuxant.



--
George N. White III  <aa056@xxxxxxxxxxxxxx>


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux