Re: How to debug complete kernel lock-ups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/31/07, John Sigler <[email protected]> wrote:
> "It seems that the PCI clock on this system has a rather large over- and
> undershoot and we suspect that the undershoot (of ~1V) is causing a drop
> in the core voltage of the on-board FPGA which results in lockup of the
> firmware. Both the under- and overshoot are well outside the allowed
> ranges (high=VCC+0.5V and low=-0.5V) of the PCI specification and a
> premature conclusion might be that the system does not comply to the PCI
> spec and that this is the cause of the lockup on this PC."
>
> This is waaay out of my league, as my area is software.
>
> Is it typical for voltage issues to hang hardware?

Yes, if the voltage is applied (or lacking) at the right place.

> Is it typical for one PCI board locking up to nail the entire system?

This doesn't appear to be a case of the *board* crashing, but rather
the board taking the pci bus and related hardware on-motherboard down
with it. Once that's down, anything that you need that goes through
the bus (on a PC, that's pretty much everything), is inaccessible.

> I don't understand why the lockup would only happen when I write to the
> 4 ports within a small time frame, and not when I only write to 2 ports
> (either one port on each card, or 2 ports on the same card). I suspected
> some kind of concurrency issue...

No, given the hardware guy's description, it's a power issue. Perhaps
when you're writing to a port, you're using more power on the card?
Four ports = 4 * the power draw. When the current load increases,
voltage drops, and if you underpower a chip, it's going to lose its
little head.

> I suppose the next logical step is to get the board's engineers
> and the system's engineers duke it out? :-)

Yes, all signs point to it being a pure hardware issue. You may be
able to work around it in software by initializing a 'counting
semaphore' to 2 to manage the maximum concurrency, so that you'll
never write more than 2 ports at a time until the hardware guys figure
it out.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux