Fedora Users — Re: Periodic Fedora 9 system hangs with jumpy mouse

On Tue, Jun 24, 2008 at 3:49 PM, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> wrote:
> Once you've done that run it for a bit and see if it seems to be
> gradually eating through swap. The overcommit test will probably work
> sanely as well with 1GB+ of swap 8)

Bad news.  The system is still periodically "hanging".  This time
I have more information, and I don't believe it to be memory/swap
related at all.  It actually is looking more like a kernel issue (?)

I was able to use another computer and keep an ssh/shell session
open onto the system.  Then starting from a freshly booted system
and logging in, I only ran firefox.  Just reading /. for a while and
scrolling a lot and eventually the system exhibited the same
hanging behavoir.  The mouse would move jumpily, but no other
interaction or screen output/updates would occur.

However the remote ssh session was still alive and interactive,
so the system itself was not dead.

Note that nothing showed up in the syslog, even with the
vm overcommit kernel parameters set as Alan suggested.
Furthermore, the system still had plenty of free memory left
and the swap was 0% used.  vmstat showed no paging at
all.  From the shell, the system still appeared to operate
normally, except that the Xorg process was pegging one of
the CPU cores at 100%.

I tried to capture stuff from the /proc entry for that process
(some of it included below).  I was unable to gdb attach to
the Xorg process (gdb would hang).  And also the Xorg process
was not killable.  Finally I tried kill -KILL on it, and it sort
of got half-killed.  The exe symlink in proc was blanked out,
and the process showed up as the name "[Xorg]" (my
understanding is that the bracket syntax indicates a process
that is dying/zombied).  However the process ID remained,
and it still showed as consuming 100% cpu in a running (R)
state; which an actual zombie would never do.

Here's the output of /proc/2682/status, before I attempted to kill it:

Name:	Xorg
State:	R (running)
Tgid:	2682
Pid:	2682
PPid:	2681
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	256
Groups:	
VmPeak:	  611884 kB
VmSize:	   34632 kB
VmLck:	       0 kB
VmHWM:	   60628 kB
VmRSS:	   25496 kB
VmData:	   18560 kB
VmStk:	      84 kB
VmExe:	    1724 kB
VmLib:	    9180 kB
VmPTE:	     592 kB
Threads:	1
SigQ:	1/16375
SigPnd:	0000000000000000
ShdPnd:	0000000000002000
SigBlk:	0000000000000000
SigIgn:	0000000000301000
SigCgt:	00000001d1806ecb
CapInh:	0000000000000000
CapPrm:	ffffffffffffffff
CapEff:	ffffffffffffffff
Cpus_allowed:	00000003
Mems_allowed:	1
voluntary_ctxt_switches:	334819
nonvoluntary_ctxt_switches:	34772

Note that the context switches would always increase
every time I checked; even after it was in the half-killed
state.

All of the other processes for my user, except for Xorg,
could be killed off cleanly.  Only the Xorg process remained.

Also during this time, I could see no significant system
I/O occuring (where is the iostat command btw?).  Also
vmstat showed a very calm system.

Again, this is running kernel 2.6.25.6-55.fc9.i686, with
a 2-cpu smp (1 cpu with hyperthreading).

Is there anything I can do to capture more useful information
the next time this happens?
-- 
Deron Meranda

-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list