Re: [smartmontools-support] The Death and Diagnosis of a Dying Hard Drive - Is S.M.A.R.T. useful?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Theodore Tso wrote:

The real question though is whether the disk continues to work OK from this point forward, or whether it is a prelude to an ever-increasing number of bad blocks. If it is the latter, and S.M.A.R.T. still didn't give any warning, then it would certainly be an indictment of that particular manufacturer's S.M.A.R.T. implementation.

I have a practical suggestion. Most recent disk drives have a new type of self-test option called 'selective self-tests'. This allows you to run a self-test on up to five user-defined ranges of LBAs. For example, if you suspect that LBA=12345678 is failing, then instead of having to wait an hour or two for the entire disk surface to be scanned, you can tell the disk to scan (say) the range LBA_1=12345000 to LBA_2=12345999 five times in a row, which takes only a few seconds. By repeating this process many times you can scan a trouble area on the disk a few thousands of times in an hour.

For a couple of years, smartmontools smartctl has had the functionality to invoke these selective self-tests if the disk supports them. But (until just last week) it was awkward: it required a kernel built with TASKFILE support enabled, and only worked with (some of the) ide drivers. This has changed. Thanks to hard work by Doug Gilbert and Jeff Garzik to built a SAT (SCSI to ATA Translation) layer in libata and to put a SAT interface into smartmontools, anyone can easily access this functionality with any SATA disk that supports selective self-test via libata.

Note: no smartmontools release incorporates this yet. You have to build from CVS. Here are the instructions (4 lines):

cvs -d:pserver:[email protected]:/cvsroot/smartmontools login (when prompted for a password, just press Enter)
cvs -d:pserver:[email protected]:/cvsroot/smartmontools co sm5
cd sm5
./autogen.sh && ./configure && make

Here is an example of running a selective self-test five times on the same range of LBAs as above:

[slave0123 ~]# ./smartctl -d sat -t select,12345000-12345999 -t select,12345000-12345999 -t select,12345000-12345999 -t select,12345000-12345999 -t select,12345000-12345999 /dev/sda
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".
SPAN         STARTING_LBA           ENDING_LBA
   0             12345000             12345999
   1             12345000             12345999
   2             12345000             12345999
   3             12345000             12345999
   4             12345000             12345999
Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful. Testing has begun.


Wait a few seconds, then see the results of the selective self-testing:
[slave0123 ~]# ./smartctl -d sat -l selective -l selftest /dev/sda
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Selective offline   Completed without error       00%      1473         -
# 2  Selective offline   Completed without error       00%      1473         -
# 3  Extended offline    Completed without error       00%      1467         -

SMART Selective self-test log data structure revision number 1
 SPAN   MIN_LBA   MAX_LBA  CURRENT_TEST_STATUS
    1  12345000  12345999  Not_testing
    2  12345000  12345999  Not_testing
    3  12345000  12345999  Not_testing
    4  12345000  12345999  Not_testing
    5  12345000  12345999  Not_testing

Justin, I hope that this is of some help to you and others with similar issues.

Cheers,
	Brucce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux