Re: Problems with FC4 kernels 1658 and 1831

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 09, 2006 at 11:15:37AM -0800, Peter J. Stieber wrote:

PS = Peter Stieber
PS>> I have three different machines that have been having
PS>> problems with the last two FC4 kernels. The first is a
PS>> 733 MHz Pentium III with 256 MB RAM
PS>> and a 230 GB IDE disk. It acts an Apache web
PS>> server and a subversion server. The machine gets
PS>> into a mode where it prints the following on
PS>> the console (the numbers may be a little different):
PS>>
PS>> Normal per-cpu:
PS>> cpu 0 hot: low 62, high 186, batch 31, used 99
PS>> cpu 0 cold: low 0, high 62, batch 31, used 57
PS>> HighMem per-cpu: empty
PS>> Free pages:           2972kB (0kB HighMem)
PS>> Active:0 inactive:0 dirty:0 writeback:0 stable:0 free:743 slab:16038
PS>> mapped:22857 pagetables:22109
PS>> DMA: free:1080kB  min:124kB low:152kB high:184kB active:120kB
PS>> inactive:264kB present:16384kB pages_scanned:1840433 all_unreclaimable?
PS>> yes
PS>> lowmem_reserve[]: 0 239 239
PS>> Normal free:1892kB min:1916kB low:2392kB high:2872kB active:13492kB
PS>> inactive:13320kB present:245680kB pages_scanned:787375
PS>> all_unreclaimable? yes
PS>> lowmem_reserve[]: 0 0 0
PS>> HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB
PS>> present:0kB pages_scanned:0 all_unreclaimable? no
PS>> lowmem_reserve[]: 0 0 0
PS>> DMA: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024KB
PS>> 0*2048kB 0*4096kB = 1080kB
PS>> Normal: 1*4kB 0*8kB 2*16kB 2*32kB 0*64kB 4*128kB 1*256kB 0*512kB
PS>> 1*1024KB 0*2048kB 0*4096kB = 1892kB
PS>> HighMem: empty
PS>> Swap cache: add 222217, delete 221977, find 1020061/1033074, race 0+2
PS>> Free swap = 0kB
PS>> Total swap = 524280kB
PS>>
PS>> Notice the machine is out of swap space so there seems to be a process PS>> that is running amuck, but I can't tell what it is. I have to shut the
PS>> machine off to make progress. This also occurred with 1831.

DJ = Dave Jones
DJ> When this gets printed, there should also be something in the logs like DJ> "kernel: killed process foo", which should be the app that was consuming
DJ> your memory.

I'm looking in /var/log/messages. I think this is the type of messages you are asking about...

Feb 9 09:39:42 homer kernel: Out of Memory: Killed process 1686 (httpd).

There are a bunch related to the apache daemon, but there are others.

Feb 9 10:32:24 homer kernel: Out of Memory: Killed process 12852 (AutoCleanAll.sh).

This is a script of mine.

Feb 9 10:32:26 homer kernel: Out of Memory: Killed process 23035 (unicode_start).

There are a few of these unicode_start messages.

I also see the following just prior to the problem starting...

Feb  9 04:03:49 homer init: Trying to re-exec init
Feb 9 08:37:44 homer login(pam_unix)[1768]: session opened for user pstieber by (uid=0)
Feb  9 08:37:45 homer ainit: Memory: Failed to release semaphore
Feb  9 08:37:45 homer ainit: Error: No such file or directory
Feb  9 08:37:45 homer ainit: Memory: Failed to release SHM segment
Feb  9 08:37:45 homer ainit: Error: No such file or directory
Feb  9 08:37:45 homer ainit: Memory: Failed to release SHM segment
Feb  9 08:37:45 homer ainit: Error: No such file or directory
Feb  9 08:37:45 homer ainit: No such file or directory
Feb  9 08:37:45 homer ainit: Memory: Failed to release semaphore
Feb  9 08:37:45 homer ainit: Error: No such file or directory
Feb  9 08:37:45 homer ainit: Memory: Failed to release SHM segment
Feb  9 08:37:45 homer ainit: Error: No such file or directory
Feb  9 08:37:45 homer ainit: Memory: Failed to release SHM segment
Feb  9 08:37:45 homer ainit: Error: No such file or directory
Feb  9 08:37:45 homer ainit: No such file or directory
Feb  9 08:37:45 homer  -- pstieber[1768]: LOGIN ON tty2 BY pstieber
Feb 9 08:43:13 homer sshd(pam_unix)[12680]: session opened for user pstieber by (uid=0) Feb 9 08:43:46 homer sshd(pam_unix)[12680]: session closed for user pstieber Feb 9 08:43:46 homer sshd(pam_unix)[12705]: session opened for user pstieber by (uid=0) Feb 9 08:45:40 homer sshd(pam_unix)[12705]: session closed for user pstieber
Feb  9 09:39:33 homer kernel: oom-killer: gfp_mask=0x201d2, order=0
Feb  9 09:39:33 homer kernel: Mem-info:
Feb  9 09:39:33 homer kernel: DMA per-cpu:

That's Alsa related.

Here are all of the "Killed" messages from the dual opteron machine...

Feb 8 17:27:40 maggie kernel: Out of Memory: Killed process 2572 (httpd). Feb 8 17:31:55 maggie kernel: Out of Memory: Killed process 2573 (httpd). Feb 8 17:31:58 maggie kernel: Out of Memory: Killed process 2574 (httpd). Feb 8 17:32:02 maggie kernel: Out of Memory: Killed process 2575 (httpd). Feb 8 17:39:25 maggie kernel: Out of Memory: Killed process 2576 (httpd). Feb 8 17:41:00 maggie kernel: Out of Memory: Killed process 2577 (httpd). Feb 8 17:41:04 maggie kernel: Out of Memory: Killed process 2578 (httpd). Feb 8 17:41:10 maggie kernel: Out of Memory: Killed process 2579 (httpd). Feb 8 17:41:13 maggie kernel: Out of Memory: Killed process 3232 (bash). Feb 8 17:50:17 maggie kernel: Out of Memory: Killed process 10658 (BuildSlamemClus). Feb 8 17:52:59 maggie kernel: Out of Memory: Killed process 24696 (unicode_start). Feb 8 17:53:28 maggie kernel: Out of Memory: Killed process 26200 (unicode_start). Feb 8 17:53:32 maggie kernel: Out of Memory: Killed process 3968 (unicode_start). Feb 8 17:54:02 maggie kernel: Out of Memory: Killed process 3945 (unicode_start).

It looks like it tries to kill any process that is running when the system runs out of resources, so it may not be the listed processes. This makes sense because the machines never recover once they are in this state. If it kills the problem process you would think the machine would recover.

The third machine actually recovered from it's problem. It was running 2.6.14-1.1656_FC4. It looks like CVS was causing the problem (this machine is a CVS server, but that is just hard to believe.

Feb  6 07:09:20 marge kernel: Out of Memory: Killed process 28480 (cvs).
Feb  6 10:50:27 marge kernel: Out of Memory: Killed process 30501 (cvs).

PS>> The third machine is a dual Tyan Dual Opteron machine.
PS>> It experienced the problem with a 1831 when trying to
PS>> compile a large code.
PS>>
PS>> Has anyone experienced similar problems?

DJ> There were some leaks in the older kernels that
DJ> could explain it, but they usually manifested
DJ> themselves slightly differently. 1831 has them
DJ> fixed though (though there may still be some
DJ> undiscovered, and there's one known problem
DJ> that I was made aware a day or so ago
DJ> where SELinux leaks memory. I'll get a test kernel
DJ> with that fix built after 2.6.15.4 is released).

Thanks. Is there anything else I can do to help debug?

Pete


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux