Re: Random reboots

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 14, 2006 at 09:54:47AM +0100, Erik Mouw wrote:
> On Mon, Feb 13, 2006 at 04:49:57PM -0500, Ryan Richter wrote:
> > On Mon, Feb 13, 2006 at 04:39:29PM -0500, ryan wrote:
> > > It runs Debian Sarge for AMD64.  I have lots of other machines, but only
> > > this one gets the reboots.  None of the others have SCSI, and none are
> > > dual-CPU with memory on both nodes, just to name two obvious things
> > > different on this machine.
> > 
> > Thinking about this some more...  My home desktop also is a dual opteron
> > with memory on both nodes and SCSI, but it hasn't had any reboots.  The
> > machine with the reboot trouble uses RAID5+LVM, unlike my desktop.  Also
> > it's an NFS server, but I have another machine (single-cpu pentium 4, no
> > SCSI etc.) that's an NFS server without reboots.  But none of the other
> > machines have RAID or LVM.
> 
> We recently had such an issue with a dual AMD64 machine rebooting at
> mke2fs. It turned out it was a faulty power supply. After we changed
> the power supply, everything ran smooth again.
> 
> You could start to test by powering your drives from an old AT-style
> power supply leaving more "juice" for the main board and CPUs.

It's possible, but I doubt it.  More often than not, the reboot happens
when the machine is completely idle - in fact I can't remember a single
time when it wasn't idle.  I just spent a couple months debugging a
SCSI-tape crash, and I ran the backups a lot and had lots of RAID
resyncs and it *never* rebooted during either of these events.  Anyway
it has quite a large 2+1 redundant power supply, and, like I said, we
routinely had 3+ months of uptime with older kernels.

During the years I've had this machine, I've experienced at least 10-15
strange kernel bugs that only happened on this machine.  Each and every
time I was *convinced* that the hardware was at fault (and people on the
mailing list suggested it) until either a kernel came out that fixed the
problem or a kernel developer positively identified it as a kernel
problem and eventually fixed it.  This machine just seems to be a magnet
for kernel bugs.

Thanks,
-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux