I think you could use MON instead of nagios as a better monitoring and alarming system based on perl scripts..... ----- Original Message ----- From: Mauri Sahlberg <Mauri.Sahlberg@xxxxxxxxxxxxxxxx> Date: Thu, 02 Sep 2004 11:16:32 +0300 To: For users of Fedora Core releases <fedora-list@xxxxxxxxxx> Subject: Re: Help: Runaway processes killing server... > A lot of speculation follows, probably not very light reading and might > be complete gooblegeep. > > Tommy Reynolds wrote: > | Try adding more swap space. Check the web for how to use an ordinary > | file for this if you don't have any free disk space. Something like: > | > > Adding swap might help and I certainly hope it will. Under normal load > the old swap, which was a half gigabyte in size, was in practice unused. > Now the total amount of swap is four and half gigabytes which should be > a lot more than is required. > > Somehow this combination of events and programs caused a very rapid > consumption of both cpu and memory which resulted state that was > unrecoverable despite of OOM. As OOM killed http processes the load > coming in from them should have vanished and the memory should have been > freed. This did not happen and according to apache logs, if it was able > to update it's logs, the external pressure had also vanished, that is, > the spammer had stopped loading pages when they became unresponsive. > > The httpd-process seems to peak it's usage of cpu and memory upon > startup, so the OOM probably kept killing "same" innocent http process > over and over again. Meanwhile nothing else got cpu but the http-process > that spawned the new ones and the OOM that killed the httpds that were > spawned. > > Probably a better work around for the problem would be limiting resource > usage of the apache user and the postgres user as Alexander Dalloz > proposes. I'll probably try this if the increased swap does not help. > > Thanks to both of you. > > I could also work around this problem by implementing a script that > monitors the resource usage of both postgres and apache users and shuts > ~ the services down for a while when preset limit is exceeded or better > yet, use nagios to do this. > > What I am actually looking for is clues how to find out what causes the > rapid consumption of the resources, where, by whom and how fast this > actually happens. I'm looking for tools to do better post mortem > diagnosis or tools that would gather me better information for post > mortem diagnosis. The /var/log/messages with OOM lines did not help me a > bit. > > I have a hunch that either php, httpd or postgres has a bug in it that > will cause it to consume everything that it can get when certain > conditions are met. There are switches that can be turned in all of the > three programs that might help but to identify which switches and in > which program I need more info. > > Regards, > > > -- > fedora-list mailing list > fedora-list@xxxxxxxxxx > To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list -- ______________________________________________ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze