Fedora Users — Re: Help: Runaway processes killing server...

A lot of speculation follows, probably not very light reading and might
be complete gooblegeep.

Tommy Reynolds wrote:
| Try adding more swap space.  Check the web for how to use an ordinary
| file for this if you don't have any free disk space.  Something like:
|

Adding swap might help and I certainly hope it will. Under normal load
the old swap, which was a half gigabyte in size, was in practice unused.
Now the total amount of swap is four and half gigabytes which should be
a lot more than is required.

Somehow this combination of events and programs caused a very rapid
consumption of both cpu and memory which resulted state that was
unrecoverable despite of OOM.  As OOM killed http processes the load
coming in from them should have vanished and the memory should have been
freed. This did not happen and according to apache logs, if it was able
to update it's logs, the external pressure had also vanished, that is,
the spammer had stopped loading pages when they became unresponsive.

The httpd-process seems to peak it's usage of cpu and memory upon
startup, so the OOM probably kept killing "same" innocent http process
over and over again. Meanwhile nothing else got cpu but the http-process
that spawned the new ones and the OOM that killed the httpds that were
spawned.

Probably a better work around for the problem would be limiting resource
usage of the apache user and the postgres user as Alexander Dalloz
proposes. I'll probably try this if the increased swap does not help.

Thanks to both of you.

I could also work around this problem by implementing a script that
monitors the resource usage of both postgres and apache users and shuts
~ the services down for a while when preset limit is exceeded or better
yet, use nagios to do this.

What I am actually looking for is clues how to find out what causes the
rapid consumption of the resources, where, by whom and how fast this
actually happens. I'm looking for tools to do better post mortem
diagnosis or tools that would gather me better information for post
mortem diagnosis. The /var/log/messages with OOM lines did not help me a
bit.

I have a hunch that either php, httpd or postgres has a bug in it that
will cause it to consume everything that it can get when certain
conditions are met. There are switches that can be turned in all of the
three programs that might help but to identify which switches and in
which program I need more info.

Regards,