Bruno Wolff III wrote:
On Wed, Mar 26, 2008 at 08:35:49 -0500,
"David G. Mackay" <mackay_d@xxxxxxxxxxxxx> wrote:
Shouldn't there have been some indication of problems prior to the
failure?
Only if you are lucky. Someone at Google published some information about
smart around a year ago. In cases where catastrophic failures occur, for a high
percentage there is no warning from smart.
The big issue is that most of the smart implementations don't scan the disk for
bad blocks, and in my experience several years ago with a 1000+ disks in
services was that the #1 failure was bad blocks, and smart did little to catch
that. The #2 failure was failure to spin up at all, but this seemed to be
confined to certain batches.
One thing that I would do was do a simple "dd if=/dev/sdx of=/dev/null bs=1M" on
all of my disks maybe 1x per week or 1x per month to scan it yourself, if the
disk detects a sector getting too many errors (still correctable with the extra
bits they have) they will move the data from the bad sector to a spare, and mark
the bad sector bad, and I believe smart counts when this has been done.
Roger