Re: Hardware bug or kernel bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday 12 October 2006 20:13, you wrote:
>
> A reboot usually indicates a serious hardware problem - it could be an
> overheating sensor tripping, but it could be some serious corruption
> causing a triple-fault or something like that too.
>
> But the _most_ likely problem is just the power supply. If your power
> supply is border-line, having something that stresses CPU, disk,
> southbridge and networking at the same time may be just the way to cause a
> power-fail signal, which usually causes an instant reboot.

The power supplies in both machines on which I'm seeing the problem are brand 
new, supposedly good quality and from different manufacturers. Could it be 
that the motherboard has some fault which causes it to overload even good 
power supplies?

> I think it just changes timings, and there is something timing-related
> going on - like just instant power draw. The timer frequency should not
> have any serious impact on heat, so I doubt it's about overheating, but
> it's certainly worth opening the case and using one of those
> compressed-air things to cool down the CPU and/or southbridge chips.

The motherboard has all the usual heat sensors and will alarm if something 
gets too hot - I suspected overheating the first time this happened and 
checked the temps in the BIOS, but everything was well within limits.

> Interrupts generally aren't problematic, I'd be more likely to suspect CPU
> overclocking or similar (does the cpuinfo match the frequency claimed by
> the BIOS?) or just some strange motherboard problem (which could be
> firmware: bad programming of memory timings etc). So a BIOS upgrade is
> worth looking into.

The cpuinfo does indeed match the reported BIOS speed. The boards are already 
running the latest BIOS, so if it is a BIOS issue the motherboard 
manufacturer isn't aware of it...

> Soemtimes issues like this can be worked around - for example, maybe the
> problem is the chipset having issues with concurrent DMA or something, so
> turning off DMA on the disk drives could possibly at least _hide_ the
> problem.

I should have mentioned that of the two machines that are having the problem, 
one is using IDE and the other SATA. The SATA machine seems worst affected by 
it.

> But check the power supply first. And check to see if there is a BIOS
> upgrade available. You can double-check the cooling: check that all
> heat-sinks are properly seated and have appropriate amounts of thermal
> grease. And blowing air from a compressed-air can on top of the things
> until you see the frost over is certainly a good spot-check.

OK I'll give all that a go.

Thanks for your help,
David.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux