Fedora Users — Re: tracking down a lockup problem

| From: Jeffrey Ross <jeff@xxxxxxxxxx>

| I'm having system lockup problems with Fedora 7.  This is running on a
| machine that ran FC6 for long stretches of time without reboot.
| 
| My questions are:
| 
| 1) How to I track down what is causing the lockups
| 2) When I find out what is causing the lockups how do I submit a bug?

Good questions.  Ones that I'm working through at the moment.  Perhaps
what I'm about to say is covered on the wiki but I don't see it.

What do you mean by a lockup?  I mean: computer no longer responds to
external stimulae: mouse movement, keys struck (including Alt SysReq),
pings from other computers, ...

(If it isn't really locked up then the Magic System Request Key might
give you useful information; see Documentation/sysrq.txt in the kernel
source, or google.)

The best case for a lockup is that the kernel has panicked.
Alternatively, it could be locked up with NO signalling of what is
wrong.  Oh well.

Assuming that the kernel has panicked, how can you get hints of what
has gone wrong?  A kernel panic dumps a bunch of diagnostic
information on the "console" and then sits there with interrupts
disabled, ignoring everything.  So you need to have captured the
console output.  Too late to decide this after the crash.

Unfortunately, a panic shuts everything down, including logging.  If
you could issue a dmesg, you would see the messages, but you cannot
issue a dmesg.  Similarly, the panic won't appear in
/var/log/messages.  A BUG message (sort of like a non-fatal panic)
doesn't shut things down so either of these techniques would work.

How can you see console output?  By not running X and just using the
text console.  Pretty restrictive.  Or by using a serial-console (more
about this later).  In my current testing, I'm leaving the text
console up on the main screen and using second computer as an x-server
for commands that I'm running on the first computer.  (Remember, in X
terminology, the X server is the machine with the screen and the
client is the machine with the program.)  It really helps to have more
than one computer!

The information presented upon a kernel panic is probably more than
the 25 lines that a normal text console has.  So you will lose some
information (perhaps the most important) if you use a normal text
console.  I boot my kernel with the option vga=795 to give me
1280x1024 resolution on my text console; that gives me about 64 lines
of text.  Even this isn't enough (because I have a dual processor,
there are two stackdumps in the panic message).

If you don't use the console for a while, it will blank.  Once a crash
happens, you cannot induce the screen to become readable again.  So I
use setterm(1) to stop blanking.
	setterm -powerdown 0 -blank 0

If you don't want to tie up your main display while you wait for a
panic's message, you might consider a serial console.  A terminal (or
other computer) connected to your computer by a built-in serial port
can be used as the console.  I've not tried this because my main
machine doesn't have a serial port.  You can even use a printer port
as an output-only console but I think that this requires a printer
driver that is built-in, not a module as is (I think) the case with
fedora.  See
  http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/
  http://mbligh.org/linuxdocs/Kernel/SerialConsole
  Documentation/serial-console.txt in kernel source

I see in you /var/log/messages that you are experiencing
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240982
but that should not cause a lockup.  This is an example of a BUG
message.

For an example of a panic message, see an earlier message I wrote to
this list
https://www.redhat.com/archives/fedora-list/2007-June/msg01076.html