Hi, jludwig, Thanks. Because all our Linux boxes are running at UDMA 5, 365 days a year, 24 hours a day. And also since most of the failed hard drives runs for about 3 years, I assume they are dying at the first glance. So my solution to the failed hard drives is, backup and restore the data files to a same size new hard disk when the first disk error messages appears, then return the failed hard drives to Maxtor. Since we have thousands of hard drives, it seems impractical to run low-level format for failed drives, because it seems that is the work of Maxtor vendor instead of end users. Thanks a lot again for your helpful analysis and suggestions! --Guolin Cheng -----Original Message----- From: jludwig [mailto:wralphie@xxxxxxxxxxx] Sent: Friday, April 30, 2004 2:40 PM To: For users of Fedora Core releases Subject: RE: disk problems or false alarm?? On Fri, 2004-04-30 at 15:01, Guolin Cheng wrote: > Hi, jludwig, > > Thanks for your helpful information. > > Because I'm running Linux, so I assume there are no viruses. Then comes > several questions: > > 1, How can I know whether all the spare sectors are in use and the disk > will lose data, or it is just the beginning of disk failure? > There is no real way to know if you are using spare sectors (even new drives use a few since perfect media is rare) since this is part of the hard drive system's firmware and happens automatically. > 2, How I can identify that the hard drive becomes dying at the first > minute? Run the smartd daemon < chkconfig smartd on > > 3, How to identify the malfunctioning hard drives? Should I idle the > machine and test hard drives one by one to figure it out? Mostly it is > the faiure-reporting hard drive failed, but I remember for sure, in a > few cases, other alternative hard drives failed instead. The only way to really check a hard drive is a multiple 100% read/write of each sector. Needless to say the drive must be taken out of service and all data is removed. > > 4, Should I replace hard drives when I first see this kind of disk error > messages in case data begin to lose? When you see this it usually indicates a drive has used up all the spares. When you do see this; 1) back up your data 2) watch for another R/W failure 3) Depending on the nature of the drive and system have a new drive ready 4) Don't assume the drive has failed or lost sectors. I have had drives that were "thrown out" when all that was really needed was a factory "low level format" which rechecks all sectors. (This is not a true low level format which can only be done at the factory or other facility with the proper equipment). > Thanks a LOT... > > --Guolin Cheng > Snip -- jludwig <wralphie@xxxxxxxxxxx> -- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list