Re: System lockup with SMP Kernel.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 04 Feb 2004 08:32:46 +0000, WipeOut <wipe_out@xxxxxxxxxxxxxxxxxxxxx> wrote:

Paul Furness wrote:

Interestingly, I had the system hang on me 5 times today. I then stopped
cron and anacron, and just to make sure I individually ran all the jobs
that anacron might have tried to run (/etc/cron.*/*) without any
trouble.

cron and anacron are still stopped, and I haven't hung since. I'll try
leaving it on overnight tonight and see what happens.

I did recompile the kernel but haven't tried booting it yet, because the
system hasn't hung for a while.

Oh, incidentally (in case this helps) I have Athlons, not Pentiums, so
this doesn't look to be CPU specific.

Watch this space for further updates... :)

P.

I have run a P4 HT system all night on the SMP kernel and it didn't crash, the systems was setup as a minimum install and then Apache, PHP and MySQL were added..

So it looks like the crashing SMP problem is caused by something that is installed when a workstation or desktop install is done, and that is possibly run or triggered by Cron..

The only difference I can see between the two cron.daily directorys is that the workstation install has a "tetex.cron" script in cron.daily but I dont think that would be the cause of the problem..

The other thing that could be freaking it out is when prelink runs, maybe when prelink is doing whatever it does it is hanging a workstation but not a minimum installed system..



I'm willing to bet a part of your particular hangs - the one sometimes caused by cron in smp machines at night - is caused by the symbolically linked file 00-logwatch, or rather, how the shells are implemented in smp computers for that particular package (logwatch-4.3.2-2.1) cron job. I didn't look to see which "group" of packages it comes with, probably Developer or System Tools. It wouldn't be in a minimal install.

The quick fix is to just "rm 00-logwatch" in /etc/cron.daily, or remove the whole package, unless you are using logwatch to track special log reports, etc. You can always initiallize logwatch files by executing "/etc/log.d/scripts/logwatch.pl" manually. The alternative is to remove OO-logwatch and to replace it by the following script, calling it (the script) the same name. The "OO-" is in there just to make sure logwatch is initiallized first - before anything else is executed in the cron daily.

#/bin/bash
/etc/log.d/scripts/logwatch.pl
echo "logwatch done" >> /var/log/cron

You can leave off the last line if desired. I just put it in to be able to see in the cron log if the script was executed - not necessarily logwatch.pl. .

Anacron, a #/bin/sh script, doesn't have any problem running the symbolic link pointing to a pearl script running in a bash shell. Cron, a #/bin/bash script, apparently does. I think the chain of scripts/links gets tangled up in which processor/memory address to use. I'll leave that up to the "wizards" of smp. In any event, since I did that, I haven't had any hangs/lockups whatsoever. One caveat to all of this, is that I'm running FC1 - testing, fully updated, with kernel-2.4.22-1.2166.nptlsmp and did not test on 2149. 2166 is really solid. Running big transfers on NFS and no problems.
Just for the record, you don't have to wait for cron to run every night. vi your crontab file (save a copy of the original first) and set the minutes and hours to about three minutes ahead of the "computer" present time and it will execute. cron reads the crontab file every minute for executable times. This was also the way that I found out that cron doesn't check the "timestamps" of any of the cron jobs (/var/spool/anacron) before running. In will run even though anacron just ran 5 minutes before - or it could attempt to run at the same time. anacron runs 65 minutes after booting.


All things considered, always look for problems with symbolic links in cron jobs. That's a hang over from Unix days when it was necessary to have full path names. It's so easy to write a two line script that symbloic links ought to be banned in these cases - cron jobs. Unfortunately, if that package is upgraded it will probably put the link back in.

You can also add a line in some of the indiviual scripts such as

	echo "Cron Daily 0anacron finished" >> /var/log/cron

just to see if it executed.

If you installed everything you may have a whole slew of these problems,

HTH

Bob Jones





[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux