On Monday 06 February 2006 21:30, zanecb@xxxxxxxxxxxxxxxxxxxxxxx wrote: >> Zane C. B. wrote: >>> hdb: lost interrupt >>> hda: status error: status=0x58 { DriveReady SeekComplete >>> DataRequest } ide: failed opcode was: unknown >>> hda: drive not ready for command >>> hda: irq timeout: status=0xd0 { Busy } >>> ide: failed opcode was: unknown >>> ide0: reset: success >>> >>> Any ideas what is happening or suggestions for testing for what is >>> happening? >>> >>> smartctl -H /dev/hda and smartctl -l error /dev/hda show the drive >>> as being good. >> >> Download and run the manufacturer's diagnostics tools (full barrage >> of tests) to eliminate hard drive faults first. Then we can see >> whether we're looking at a controller issue? > >Actually trying to advoid this so I don't have to take the machine out > of production. If its indeed that important, then DO IT NOW while you still have data that can be recovered when it does curl up its toes. If the data is valuable, then production must understand that their baby needs a fresh diaper. If they can't do that, then I assume you can get an overtime approval to check it after hours? But by then, it may well be too late. IBM published some papers a few years ago about how they were attempting to arrive at some sort of a meaningfull indicator of impending drive failure, but their best work at the time could give only a 20 minute warning, this in the heyday of deathstars. Apply Moores Law, and it might be 8 hours today. Emphasis on the might... I trust that you do have backups? Don't you? Common Sense... Why is it so uncommon? -- Cheers, Gene People having trouble with vz bouncing email to me should add the word 'online' between the 'verizon', and the dot which bypasses vz's stupid bounce rules. I do use spamassassin too. :-) Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2006 by Maurice Eugene Heskett, all rights reserved.