On Mon, 06 Feb 2006 21:57:29 -0500 Gene Heskett <gene.heskett@xxxxxxxxxxx> wrote: > On Monday 06 February 2006 21:30, zanecb@xxxxxxxxxxxxxxxxxxxxxxx wrote: > >> Zane C. B. wrote: > >>> hdb: lost interrupt > >>> hda: status error: status=0x58 { DriveReady SeekComplete > >>> DataRequest } ide: failed opcode was: unknown > >>> hda: drive not ready for command > >>> hda: irq timeout: status=0xd0 { Busy } > >>> ide: failed opcode was: unknown > >>> ide0: reset: success > >>> > >>> Any ideas what is happening or suggestions for testing for what is > >>> happening? > >>> > >>> smartctl -H /dev/hda and smartctl -l error /dev/hda show the drive > >>> as being good. > >> > >> Download and run the manufacturer's diagnostics tools (full barrage > >> of tests) to eliminate hard drive faults first. Then we can see > >> whether we're looking at a controller issue? > > > >Actually trying to advoid this so I don't have to take the machine out > > of production. > > If its indeed that important, then DO IT NOW while you still have data > that can be recovered when it does curl up its toes. If the data is > valuable, then production must understand that their baby needs a fresh > diaper. If they can't do that, then I assume you can get an overtime > approval to check it after hours? But by then, it may well be too > late. IBM published some papers a few years ago about how they were > attempting to arrive at some sort of a meaningfull indicator of > impending drive failure, but their best work at the time could give > only a 20 minute warning, this in the heyday of deathstars. Apply > Moores Law, and it might be 8 hours today. Emphasis on the might... > > I trust that you do have backups? Don't you? Not worried about it going down because of that.