Fedora Users — Re: Catastrophic disk failure, where was smartd?

Re: Catastrophic disk failure, where was smartd?

Date Prev

Date Next

Thread Prev

Thread Next

Date Index

Thread Index

Roger Heflin wrote:

The big issue is that most of the smart implementations don't scanthe disk for bad blocks, and in my experience several years ago witha 1000+ disks in services was that the #1 failure was bad blocks, andsmart did little to catch that. The #2 failure was failure to spinup at all, but this seemed to be confined to certain batches.
Isn't that what the long surface scan test is supposed to do?
Probably. I started using dd test before disks and Linux and otheroses supported smart. It works on any disk (or array) whether smartworks or not.

That only catches 'hard' errors. Modern drives have spare sectors andthe ability to remap soft errors internally, up to a point, before theOS knows anything about them. If the OS (or dd) sees an error, it meansyou've used up the spares or the internal retries weren't able to fixit. The smart interface is supposed to let you know far along you arein using up the internal correction and how often soft errors are hiddenby the retries. It seems good in theory, and if it predicts the driveis going bad you should probably believe it. But, I think a lot ofdrives fail faster than the internal corrections can handle so you oftendon't get any warning.

--
  Les Mikesell
   lesmikesll@xxxxxxxxx