On Fri, Oct 15, 2004 at 12:00:20PM +0100, Simon Andrews wrote: > Yesterday we had a problem with a development box running FC2. We had a > process (chesire.cgi) which went a bit mad and ate all the memory on the > box (well it is a development box!). .... > I'd like to be able to figure out what actually went on whilst this was > happening to see if there's anything we can do to fix it, or if this is > something which should be bugzilla'd. .... > Oct 14 12:48:03 bilin1 kernel: Out of Memory: Killed process 8411 > (chesire.cgi). > Oct 14 12:48:03 bilin1 kernel: chesire.cgi: page allocation failure. .... > Oct 14 12:48:03 bilin1 kernel: [<02140445>] __alloc_pages+0x2a4/0x2be How did you kill this process and want user limits were in effect. What I suspect is that the process grew to use all memory both physical and virtual (swap). Then it was killed and began dumping core. Many of the pages that needed to be dumped to the core file would have been out on swap. This read/dump could have involved two times the maximum virtual memory size of the bad boy process. Depending on the memory model this can be about 6GB of disk IO mostly kernel IO. This massive IO plus all the page theft would starve other processes of needed IO activity to run. Limits can be set externally or in the 'c' program. An extreme abuse: $ ulimit -v 5 $ ls Segmentation fault My suggestion is to establish limits for your bad boy process with a shell wrapper or with ulimit() calls in the process. See: getrlimit(2), setrlimit(2) and sysconf(3) The key is to have it hit an imposed limit soon enough that you can debug and fix it. -- T o m M i t c h e l l May your cup runneth over with goodness and mercy and may your buffers never overflow.