Fedora Users — Re: Catastrophic disk failure, where was smartd?

Re: Catastrophic disk failure, where was smartd?

Date Prev

Date Next

Thread Prev

Thread Next

Date Index

Thread Index

To: For users of Fedora <fedora-list@xxxxxxxxxx>

Subject: Re: Catastrophic disk failure, where was smartd?

From: Roger Heflin <rogerheflin@xxxxxxxxx>

Date: Wed, 26 Mar 2008 13:28:01 -0500

Cc: "David G. Mackay" <mackay_d@xxxxxxxxxxxxx>

In-reply-to: <20080326172745.GA8208@xxxxxxxx>

References: <1206538549.3785.91.camel@xxxxxxxxxxxxxxxxx> <20080326172745.GA8208@xxxxxxxx>

Reply-to: For users of Fedora <fedora-list@xxxxxxxxxx>

User-agent: Thunderbird 2.0.0.9 (X11/20071115)

Bruno Wolff III wrote:

On Wed, Mar 26, 2008 at 08:35:49 -0500,
  "David G. Mackay" <mackay_d@xxxxxxxxxxxxx> wrote:

Shouldn't there have been some indication of problems prior to the
failure?

Only if you are lucky. Someone at Google published some information about
smart around a year ago. In cases where catastrophic failures occur, for a high
percentage there is no warning from smart.

The big issue is that most of the smart implementations don't scan the disk forbad blocks, and in my experience several years ago with a 1000+ disks inservices was that the #1 failure was bad blocks, and smart did little to catchthat. The #2 failure was failure to spin up at all, but this seemed to beconfined to certain batches.

One thing that I would do was do a simple "dd if=/dev/sdx of=/dev/null bs=1M" onall of my disks maybe 1x per week or 1x per month to scan it yourself, if thedisk detects a sector getting too many errors (still correctable with the extrabits they have) they will move the data from the bad sector to a spare, and markthe bad sector bad, and I believe smart counts when this has been done.

                               Roger