Fedora Users — Re: FC2 locked up when killing runaway process

On Fri, Oct 15, 2004 at 12:00:20PM +0100, Simon Andrews wrote:

> Yesterday we had a problem with a development box running FC2.  We had a 
> process (chesire.cgi) which went a bit mad and ate all the memory on the 
> box (well it is a development box!).
....
> I'd like to be able to figure out what actually went on whilst this was 
> happening to see if there's anything we can do to fix it, or if this is 
> something which should be bugzilla'd.
....
> Oct 14 12:48:03 bilin1 kernel: Out of Memory: Killed process 8411 
> (chesire.cgi).
> Oct 14 12:48:03 bilin1 kernel: chesire.cgi: page allocation failure. 
....
> Oct 14 12:48:03 bilin1 kernel:  [<02140445>] __alloc_pages+0x2a4/0x2be

How did you kill this process and want user limits were in effect.

What I suspect is that the process grew to use all memory
both physical and virtual (swap).  Then it was killed and
began dumping core.  Many of the pages that needed to be dumped 
to the core file would have been out on swap.   This read/dump
could have involved two times the maximum virtual memory size
of the bad boy process.  Depending on the memory model this
can be about 6GB of disk IO mostly kernel IO.

This massive IO plus all the page theft would starve other
processes of needed IO activity to run.

Limits can be set externally or in the 'c' program.
An extreme abuse:
    $ ulimit -v 5
    $ ls
    Segmentation fault

My suggestion is to establish limits for your bad boy process
with a shell wrapper or with ulimit() calls in the process.
See: getrlimit(2), setrlimit(2) and sysconf(3) 

The key is to have it hit an imposed limit soon enough that
you can debug and fix it.


-- 
	T o m  M i t c h e l l 
	May your cup runneth over with goodness and mercy
	and may your buffers never overflow.