Fedora Users — Re: hard drive problems

On Mon, 06 Feb 2006 21:57:29 -0500
Gene Heskett <gene.heskett@xxxxxxxxxxx> wrote:

> On Monday 06 February 2006 21:30, zanecb@xxxxxxxxxxxxxxxxxxxxxxx
wrote:
> >> Zane C. B. wrote:
> >>> hdb: lost interrupt
> >>> hda: status error: status=0x58 { DriveReady SeekComplete
> >>> DataRequest } ide: failed opcode was: unknown
> >>> hda: drive not ready for command
> >>> hda: irq timeout: status=0xd0 { Busy }
> >>> ide: failed opcode was: unknown
> >>> ide0: reset: success
> >>>
> >>> Any ideas what is happening or suggestions for testing for what is
> >>> happening?
> >>>
> >>> smartctl -H /dev/hda and smartctl -l error /dev/hda show the drive
> >>> as being good.
> >>
> >> Download and run the manufacturer's diagnostics tools (full barrage
> >> of tests) to eliminate hard drive faults first. Then we can see
> >> whether we're looking at a controller issue?
> >
> >Actually trying to advoid this so I don't have to take the machine
out
> > of production.
> 
> If its indeed that important, then DO IT NOW while you still have data

> that can be recovered when it does curl up its toes. If the data is 
> valuable, then production must understand that their baby needs a
fresh 
> diaper.  If they can't do that, then I assume you can get an overtime 
> approval to check it after hours?  But by then, it may well be too 
> late.  IBM published some papers a few years ago about how they were 
> attempting to arrive at some sort of a meaningfull indicator of 
> impending drive failure, but their best work at the time could give 
> only a 20 minute warning, this in the heyday of deathstars. Apply 
> Moores Law, and it might be 8 hours today.  Emphasis on the might...
> 
> I trust that you do have backups?  Don't you?

Not worried about it going down because of that.