Fedora Users — Re: Periodic Fedora 9 system hangs with jumpy mouse

On Wed, Jun 25, 2008 at 3:44 PM, g <geleem@xxxxxxxxxxxxx> wrote:
>> I also tried kills in the following order:
>>
>>  kill -TERM  (-15)
>>  kill -SEGV  (-11)
>>  kill -KILL (-9)
>
> this is not same as what i was showing you above. use numbers, not words.
> type it as "kill -15 'pid#'" and use '-7' if '-15' does not work.
> there is a difference in what i am showing you and what you are using.

No, there's no difference.  These are all equivalent (on Linux):
  kill -15 pid
  kill -TERM pid
  kill -s 15 pid
  kill -s TERM pid

I got in the habit of using symbolic names because I work on
a lot of different kinds of Unix systems (not just Linux), and signal
numbers are not always the same across OS's, but signal names are.

* * *

Anyway, I had another runaway Xorg.  This time I was using
the scrollbar on a simple gnome-terminal window; not firefox.  It
seems that this problem almost always occurs while I'm using
a scrollbar of some sort.  That may or may not be coincidence,
but I'm leaning toward not at this point.

Regarding signals.... This time when I had the Xorg process
running at 100% cpu, I monitored the /proc/2384/status and
/proc/2384/task/2384/status files.  Before I even tried to do
any kill, I noticed that the ShdPnd (shared pending signals)
would toggle randomly between all zeros and 00..002000,
sometimes for the process and sometimes for the task/thread.
The wait channels (whan) were always 0 every time I looked.

I went ahead and stopped (SIGSTOP) the Xorg parent process
(gdm-simple-slave) just to make sure it wasn't sending signals.
It wasn't.  Other than the Xorg process, top showed an otherwise
practically idle system.

When I sent signals via kill to the process, I could occasionally
see those signal bits show up in the ShdPnd bitmask for a
second, and then it would go all zeros again.  Yes, I tried all
sorts of signals, including 7 (SIGBUS) too.  Nothing had any
noticeable affect on the Xorg process.

I then did a SIGSTOP on the Xorg process.  It remained in
a running state consuming 100% cpu (odd); but the signal mask
was changed and any subsequent signals I sent you could
see accumulate in the pending signal mask field (as I would
expect).  But then doing a continue (SIGCONT) later, the
pending signals would go back to 0 (and occasionally
0x2000 temporarily).

Only when I sent a SIGKILL did anything change.  The process
was effectively killed; the exe symlink was gone, all the
file descriptors were closed, etc.  But the process entry still
remained in the running state and was consuming 100% cpu.
This wasn't just a status bug, you could noticeably tell the
cpu was really pegged by the sluggishness of typing.
After the SIGKILL, the signal pending masks would always stay
at 0, regardless of any additional kills sent to it.


I still have not seen any messages show up in syslog, dmesg,
Xorg.0.log, or anyplace else I can think to look.  And the rest
of the system appears to be totally operational (but cpu starved).
No weird I/O.  All the filesystems are still quite usable.
But the only way to get Xorg out of its mess is to reboot.

So this looks like some strange kernel interaction with Xorg.
Any ideas?  Is there any other information I can get when
the Xorg process does this again that might help figure this
out?

Thanks
-- 
Deron Meranda

-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list