Fedora Users — Re: Fedora Makes a Terrible Server?

Les Mikesell wrote:

Roger Heflin wrote:
I can't recall ever being in a position of "having to bring in newhardware". What scenario forces this issue on you? I haven'tnoticed a shortage of vendors who will sell RHEL supported boxes.But it sounds like you have an interesting job...
More cpu power needed to do the job. And the new boxes aren'tofficially RHEL supported (and sometimes won't even boot with thelatest update-but will work with the latest fedora/kernel.org).
Something faster than IBM could sell you?

At the time, yes, this was before IBM sold AMD stuff, and the early thoughtroublesome Athlon's were faster than the Intel stuff.

I had a subset of machines (about 250 machines) all of which hadreached about 500+ days of uptime (the uptime counter rolled over)
Wasn't that fixed circa RH8?  I had some 7.3 machines roll over twice.

It was pre-RH8.

The issue with all OSes is that no one tests enough to catch thesehigh MTBF issues, and in a big environment a machine crashing 1x perevery 1000 days of uptime, comes to 1 machine a day crashing becauseof software, and typically the enterprise OSes aren't even close tothat level, and while fedora is worse, it is just not that much worse.
I don't think RH7.3 with its final updates or Centos3.x (where x>1) hadanything approaching a software crash per 1000 days - at least not inthe base system and common services. I mostly skipped the 4.x seriesbecause I didn't trust the early 2.6 kernels at all, but 5.1 seems solid.

Both of them have issues if you are running NFS servers with lots of clients,other than that they are pretty stable, but if you are relying on NFS heavilythat is a show-stopper, but once you get a working stable setup if you reallywant stability you don't touch it, no matter how good anyone tests things, theywill miss something, and things are worse the more different applications youare running, all doing different odd things each of which may find one the bugsno one at Redhat/Suse found in their testing.

And on top of that I have had trivial driver changes in the enterprise OSescause huge performance regressions (an FC driver update changed the queue depthto 64-which caused the speed to be 30% of what it was before on certain externalFC raid disk arrays-this affected SLES9sp3 (9sp[12] kernel was ok), SLES10, anykernel.org with the newer driver, RHEL4(all of them at the time)), so no updatecan be counted on to not cause issues, this error was not seen by the drivermaintainer until they got one of the external arrays to test with and saw itcompared to a competitors board that was 3x faster under the newer kernel, butalmost identical under the older kernel, and both RHEL and Sles testing did notcatch it, to fix it we actually had to update to an unreleased driver thatallowed the queue depth to be changed down (none of the updates at the timefixed it), and wait for a update on Sles. To get this fixed it was far easierto work with the upstream driver maintainer and get them to push the update tothe enterprise vendors than to try to get the enterprise vendors to find and fixthe problem. I was told by a different upstream maintainer that typically theenterprise vendors pushed any serious issues directly to them, and did verylittle with the issues themselves, if you could get it past their first linesupport people.

The big problem is that the testing has to include it does not crash, it runsroughly the same speed as before, and it still gives the same answer, and evenif one runs every test they know about, some configuration will still getthrough for a given setup.

I guess my experience is that even with Enterprise updates at least 25-50% ofthe time there is a serious regression (speed, crash, wrong answer), one has tocarefully consider what do I gain by doing that update. So the testing requirefor a full update is really no better than the testing required to go from F7 toF8, and fedora updates puts out new kernels faster so getting a fix into thestream is a lot faster than with the enterprise oses, and once you get one thatworks on a given piece of HW correctly you stop updating, some of the things Ihave ran into on an update are things you would have never though to test for,so you have to watch out on any update-and it is just best to only update whenrequired to.

Some of the customers I used to support typically stayed on what was shippedwith the machine, because their validation procedures were fairly extensive, andthe update not worth it when no useful features were to be gained.