Re: RAID drive failed, but SMART shows no errors?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mogens Kjaer writes:

Sam Varshavchik wrote:
...
But smartctl gives this drive a clean bill of health:

[root@headache ~]# smartctl -H /dev/sda
smartctl version 5.36 [i386-redhat-linux-gnu] Copyright © 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

SMART Health Status: OK

Try running a SMART test on the drive:

smartctl -t long /dev/sda

It will tell you how long time it takes to run the test,
you'll have to probe once in a while with

smartctl -a /dev/sda

to get the result of the test. It will be at the end:

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
      Description                              number   (hours)
# 1 Background long Completed - 12641 - [- - -]

Came up clean. Nothing shown for LBA_first_err. But the fact remains that the drive did err out. smartctl -a shows "Elements in grown defect list: 3", so I suppose that it remapped 3 sectors. I can't find anything in the output that tells me how many spare sectors are available for remapping.

Also, despite the 3 defects, in the "Error counter log" portion, both read and write show "0" for "Total uncorrected errors", so I'm not sure how to reconcile that. Sounds to me like the drive succesfully remapped a few defects and informed the host about it, but the kernel interpreted the result as a permanent error, and took the partition out of RAID.

I have three RAID-1 partitions on these disks. The one that reported an error was the largest one. I dropped the degraded partition, and hot-added it back. Immediately, another error was logged to /var/log/messages, for the same block, but despite the error, the kernel started resyncing the array:
...

If it were me, I would replace this disk. The next time you
run into this read error could be when sdb fails and you try
to resync a new sdb :-(

Yeah, I'm going to do that. But, with a clean long test, I think I have some breathing room to wait a few days for a convenient time to do it.

If I cannot do this, my third question is what do I need to do, grub-wise, to be able to swap sdb with sda? sda is the one that's failing the RAID-1 array. If I can't hot-swap it, I'll need to replace it with the sdb drive, but right now grub is installed only on sda, so how do I install a copy of all the grub boot-related stuff on sdb?

Hm? If you have used the GUI to create the RAID partitions during
installation GRUB should be on both drives.

No, I don't believe I used a GUI; I believe this was originally a text install. grub-install takes a parameter, according to its man page. It's not clear, but I think that passing it /dev/sdb will install it to the second drive. But, reading the man page's description of the --root-directory parameter muddled things a bit. Not sure I understand its purpose.

Attachment: pgpTsVjTDUlwP.pgp
Description: PGP signature


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux