Re: [Solved] Re: FC5 S/W Raid Rebuilding to Infiinity(and beyond!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sean Bruno wrote:
You have found yourself in the same situation I found myself in recently. Actually my situation was slightly different, but the resulting problem is the same. In my case at re-boot md decided that one partition of a mirror was out of sync, and so initiated a re-sync with the other partition. However, the partition which was active contained a bad sector, so the re-sync failed, over and over and over..., just like yours is doing.

In order to fix my system I used the following steps.

The first step is to take the offending filesystem offline. Then I copied the existing partition onto the good disk using dd, with the noerror option so it would continue past read errors. In my case I knew that the read error was not part of the actual filesystem in use because it passed fsck. When the copy was complete I ran fsck on the new filesystem just to be sure it had copied ok.

After this I created a new RAID consisting of just the good partition (in my case the RAID was md1 and the new partition was sda3):
  # mdadm -C /dev/md1 --force -n 1 -l 1 /dev/sda3

As a temporary fix, until a new disk arrived, I ran
   # e2fsk -c -d -f /dev/sdb3
to mark back blocks (sdb3 was the failing partition).
Then I ran:
   # mdadm --zero-superblock /dev/sdb3
to remove the md superblock from the partition so it was no longer part of a RAID.

Finally, I used mdadm to add the dodgy partition back into the RAID:

# mdadm -a /dev/md1 /dev/sdb3

and to grow the RAID to 2 partitions:

# mdadm --grow -n 2 /dev/md1

Thanks for the assistance with this Nigel.  I was able to recover from
this 'double' failure with your procedure.  I had purchased 2 new disks
in order to replace the failed drives and I am back up at this time.

Sean



You may want to do some additional testing to verify the status of the new filesystem. In my original message I implied that fsck was sufficient, but as Tony quite rightly pointed out, it isn't. On my failing disk I knew that the bad block wasn't part of the active filesystem, so a simple copy/fsck was sufficient. During the copy there were no errors, and a comparison of the two filesystems showed no discrepancies.

When you copied your filesystem, did the system generate any error messages? If so, you will probably want to investigate which file the bad block belonged to, and determine the impact that having that file corrupted might cause, and whether you can restore that file from a backup.

--
Nigel Wade, System Administrator, Space Plasma Physics Group,
            University of Leicester, Leicester, LE1 7RH, UK
E-mail :    nmw@xxxxxxxxxxxx
Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux