On Tue, Jun 24, 2008 at 3:49 PM, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> wrote: > Once you've done that run it for a bit and see if it seems to be > gradually eating through swap. The overcommit test will probably work > sanely as well with 1GB+ of swap 8) Bad news. The system is still periodically "hanging". This time I have more information, and I don't believe it to be memory/swap related at all. It actually is looking more like a kernel issue (?) I was able to use another computer and keep an ssh/shell session open onto the system. Then starting from a freshly booted system and logging in, I only ran firefox. Just reading /. for a while and scrolling a lot and eventually the system exhibited the same hanging behavoir. The mouse would move jumpily, but no other interaction or screen output/updates would occur. However the remote ssh session was still alive and interactive, so the system itself was not dead. Note that nothing showed up in the syslog, even with the vm overcommit kernel parameters set as Alan suggested. Furthermore, the system still had plenty of free memory left and the swap was 0% used. vmstat showed no paging at all. From the shell, the system still appeared to operate normally, except that the Xorg process was pegging one of the CPU cores at 100%. I tried to capture stuff from the /proc entry for that process (some of it included below). I was unable to gdb attach to the Xorg process (gdb would hang). And also the Xorg process was not killable. Finally I tried kill -KILL on it, and it sort of got half-killed. The exe symlink in proc was blanked out, and the process showed up as the name "[Xorg]" (my understanding is that the bracket syntax indicates a process that is dying/zombied). However the process ID remained, and it still showed as consuming 100% cpu in a running (R) state; which an actual zombie would never do. Here's the output of /proc/2682/status, before I attempted to kill it: Name: Xorg State: R (running) Tgid: 2682 Pid: 2682 PPid: 2681 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: VmPeak: 611884 kB VmSize: 34632 kB VmLck: 0 kB VmHWM: 60628 kB VmRSS: 25496 kB VmData: 18560 kB VmStk: 84 kB VmExe: 1724 kB VmLib: 9180 kB VmPTE: 592 kB Threads: 1 SigQ: 1/16375 SigPnd: 0000000000000000 ShdPnd: 0000000000002000 SigBlk: 0000000000000000 SigIgn: 0000000000301000 SigCgt: 00000001d1806ecb CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff Cpus_allowed: 00000003 Mems_allowed: 1 voluntary_ctxt_switches: 334819 nonvoluntary_ctxt_switches: 34772 Note that the context switches would always increase every time I checked; even after it was in the half-killed state. All of the other processes for my user, except for Xorg, could be killed off cleanly. Only the Xorg process remained. Also during this time, I could see no significant system I/O occuring (where is the iostat command btw?). Also vmstat showed a very calm system. Again, this is running kernel 2.6.25.6-55.fc9.i686, with a 2-cpu smp (1 cpu with hyperthreading). Is there anything I can do to capture more useful information the next time this happens? -- Deron Meranda -- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list