Fedora Users — RE: Replacing failed raid (boot) disk

First piece of the puzzle is solved:

/root/anaconda-ks.cfg says
"bootloader --location=partition"

So that answeres the "Where?" - remains the question about the "how?" - "how to get it installed in the same place of the new disk".


> -----Original Message-----
> From: fedora-list-bounces@xxxxxxxxxx 
> [mailto:fedora-list-bounces@xxxxxxxxxx] On Behalf Of Mark
> Sent: Wednesday, January 18, 2006 4:16 PM
> To: 'For users of Fedora Core releases'
> Subject: RE: Replacing failed raid (boot) disk
> 
> 
> Actually, I just thought of something:
> Would it be easier to copy the boot partition from the mirror 
> server on to the unused partition of the good drive before 
> replacing the bad drive?
> 
> Here is how the drives are partitioned right now (SDA is the 
> bad drive that needs to be replaced): sda1 -> /boot sda2 -> 
> raid sda3 -> raid1 (md0)
> 
> sdb1 -> swap
> sdb2 -> raid1 (md0)
> sdb3 -> unused (the counterpart of sda1)
> 
> BTW, these are SATA drives, in case it matters...
> 
> The good drive ad the bad drive have identical partitions, 
> however the order is different. I did not do this 
> intentionally, I tried to keep the order the same, but 
> DiskDruid kept switching around the partitions of sdb on me.
> 
> Could I use sdb as sda, or would this not work, since /boot 
> would then be on sda3, rather than sda1?
> 
> If I could switch them around I could save the part with the 
> rescue disk and do something like this:
> 
> 1. Copy the content of the second server's /boot partition to 
> sdb3 2. change /etc/fstab so that /boot is on sda3 rather 
> than sda1 3. ?? Where do I define which partitions make up 
> md0? 4. Install boot loader onto good disk 5. Shut down, 
> replace bad sda drive with good sdb drive, plug new 
> replacement into where sdb used to be. 6. Boot (from sda, 
> previously sdb), partition sdb, and get mdadm to resync md0 
> onto the new drive.
> 
> This way I would have less downtime, since I do not need to 
> run in rescue mode.
> 
> I would still have the same problems in step 4 that I had 
> with the first version, of course.
> 
> Thanks,
> 
> MARK
> 
> 
> > -----Original Message-----
> > From: fedora-list-bounces@xxxxxxxxxx
> > [mailto:fedora-list-bounces@xxxxxxxxxx] On Behalf Of Mark
> > Sent: Wednesday, January 18, 2006 3:54 PM
> > To: fedora-list@xxxxxxxxxx
> > Subject: Replacing failed raid (boot) disk
> > 
> > 
> > Hi everybody,
> > 
> > I just got this log output a few days ago:
> > Jan 11 15:34:24 webserv1 kernel: ata1: status=0x51 {
> > DriveReady SeekComplete Error } Jan 11 15:34:24 webserv1 
> > kernel: ata1: error=0x10 { SectorIdNotFound } Jan 11 15:34:29 
> > webserv1 kernel: ata1: status=0x51 { DriveReady SeekComplete 
> > Error } Jan 11 15:34:29 webserv1 kernel: ata1: error=0x10 { 
> > SectorIdNotFound } Jan 11 15:34:59 webserv1 kernel: ata1: 
> > command 0xc8 timeout, stat 0x51 host_stat 0x61 Jan 11 
> > 15:34:59 webserv1 kernel: ata1: status=0x51 { DriveReady 
> > SeekComplete Error } Jan 11 15:34:59 webserv1 kernel: ata1: 
> > error=0x10 { SectorIdNotFound } Jan 11 15:34:59 webserv1 
> > kernel: SCSI error : <0 0 0 0> return code = 0x8000002 Jan 11 
> > 15:34:59 webserv1 kernel: sda: Current: sense key: Aborted Command
> > Jan 11 15:34:59 webserv1 kernel:     Additional sense: 
> > Recorded entity not found
> > Jan 11 15:34:59 webserv1 kernel: end_request: I/O error, dev 
> > sda, sector 11217554 Jan 11 15:34:59 webserv1 kernel: raid1: 
> > Disk failure on sda3, disabling device.
> > Jan 11 15:34:59 webserv1 kernel:        Operation continuing 
> > on 1 devices
> > Jan 11 15:34:59 webserv1 kernel: raid1: sda3: rescheduling 
> > sector 6815744 Jan 11 15:34:59 webserv1 kernel: raid1: sdb2: 
> > redirecting sector 6815744 to another mirror Jan 11 15:34:59 
> > webserv1 kernel: RAID1 conf printout: Jan 11 15:34:59 
> > webserv1 kernel:  --- wd:1 rd:2 Jan 11 15:34:59 webserv1 
> > kernel:  disk 0, wo:1, o:0, dev:sda3 Jan 11 15:34:59 webserv1 
> > kernel:  disk 1, wo:0, o:1, dev:sdb2 Jan 11 15:34:59 webserv1 
> > kernel: RAID1 conf printout: Jan 11 15:34:59 webserv1 kernel: 
> >  --- wd:1 rd:2 Jan 11 15:34:59 webserv1 kernel:  disk 1, 
> > wo:0, o:1, dev:sdb2
> > 
> > 
> > This is on a server with an unraided /boot on sda1 and a
> > software-raid1 raided / partition
> > 
> > Dell says the HD needs to be replaced, so now I got the
> > replacement hard disk. The problem is: the failed disk is the 
> > one I boot from and the boot partition is not mirrored. So I 
> > can not copy the content of the boot partition, nor get the 
> > fdisk information to partition the new disk the same way as 
> > the old one What is the best and easiest way to get the new 
> > system up and running as painlessly as possible?
> > 
> > I have a second machine with an identical setup, so I guess I
> > could get the info from that box.
> > 
> > I am thinking I need to:
> > 1. Plug the new disk in and boot from the rescue CD
> > 2. Look up the partition info on the mirror box and partition
> > the new disk accordingly. 3. Copy the content of the boot 
> > partition over from the mirrored box 4. install grub on sda 
> > (how!?!?!?) 5. Hopefully boot the machine with the replaced 
> > HD and hope that mdadm will automatically start synching the 
> > raid from the good raid disk (sdb)
> > 
> > The problem is mainly step 4: I am not sure what I had picked
> > as boot loader location from the "Advanced Boot Loader 
> > Configuration" screen ("MBR vs. first sector of boot 
> > partition). So I need to figure out
> >  a) what the location was, and
> >  b) how to get the boot loader installed there manually (I've 
> > always just used the automated install for the boot loader).
> > 
> > 
> > Is my assumption about steps 1-5 correct?
> > Does anybody have any hints regarding how to do step 4?
> > 
> > And then for the future: how can I be better prepared for
> > this next time? Is there a way to capture the partition and 
> > boot loader information (at a point before the disk actually 
> > goes bad) and then restore it to an identical drive in a more 
> > automated fashion?
> > 
> > Thanks,
> > 
> > MARK
> > 
> > 
> > --
> > fedora-list mailing list
> > fedora-list@xxxxxxxxxx
> > To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
> > 
> 
> -- 
> fedora-list mailing list
> fedora-list@xxxxxxxxxx
> To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
>