On Mon, 2006-11-20 at 19:17 -0800, Sean Bruno wrote: > On Mon, 2006-11-20 at 07:56 -0800, Sean Bruno wrote: > > I had a disk failure recently and replaced the drive. After > > partitioning and such I added my new drive into my Raid 1 and waited for > > the rebuild to complete. > > > > It's been running for about 36 hours trying to rebuild a ~140GB Raid 1, > > which seems a bit long to me. > > > > What's even stranger, is a reboot causes the new disk to be complete > > removed from the Raid 1 set. And I have to rebuild all of my > > partitions, not just the ~140GB. > > > > Here is 'cat /proc/mdstat' as it currently sits: > > > > ---- > > [sean@home-desk ~]$ cat /proc/mdstat > > Personalities : [raid1] > > md0 : active raid1 sdb1[0] sda1[1] > > 1052160 blocks [2/2] [UU] > > > > md1 : active raid1 sdb2[0] sda2[1] > > 4192896 blocks [2/2] [UU] > > > > md2 : active raid1 sdb3[2] sda3[1] > > 151043008 blocks [2/1] [_U] > > [==>..................] recovery = 10.5% (15941760/151043008) > > finish=52.3min speed=42986K/sec > > > > unused devices: <none> > > ---- > > > > Where md0 is /boot, md1 is swap and md2 is / > > > > sdb is the new disk, sda is the running disk. If I reboot the machine > > sdb disappears completely. > > > > Any ideas out there? > > > > Sean > > > > > > > > I guess that this is being caused by a 'real' failure on /dev/sda: > > Nov 20 15:58:39 home-desk kernel: RAID1 conf printout: > Nov 20 15:58:39 home-desk kernel: --- wd:1 rd:2 > Nov 20 15:58:39 home-desk kernel: disk 0, wo:1, o:1, dev:sdb3 > Nov 20 15:58:39 home-desk kernel: disk 1, wo:0, o:1, dev:sda3 > Nov 20 15:58:39 home-desk kernel: RAID1 conf printout: > Nov 20 15:58:39 home-desk kernel: --- wd:1 rd:2 > Nov 20 15:58:39 home-desk kernel: disk 1, wo:0, o:1, dev:sda3 > Nov 20 15:58:39 home-desk kernel: RAID1 conf printout: > Nov 20 15:58:39 home-desk kernel: --- wd:1 rd:2 > Nov 20 15:58:39 home-desk kernel: disk 0, wo:1, o:1, dev:sdb3 > Nov 20 15:58:39 home-desk kernel: disk 1, wo:0, o:1, dev:sda3 > Nov 20 15:58:39 home-desk kernel: md: syncing RAID array md2 > Nov 20 15:58:39 home-desk kernel: md: minimum _guaranteed_ > reconstruction speed: 1000 KB/sec/disc. > Nov 20 15:58:39 home-desk kernel: md: using maximum available idle IO > bandwidth (but not more than 200000 KB/sec) for reconstruction. > Nov 20 15:58:39 home-desk kernel: md: using 128k window, over a total of > 151043008 blocks. > > Nov 20 16:58:01 home-desk kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x0 > Nov 20 16:58:01 home-desk kernel: ata1.00: (BMDMA stat 0x0) > Nov 20 16:58:01 home-desk kernel: ata1.00: tag 0 cmd 0x25 Emask 0x9 stat > 0x51 err 0x40 (media error) > Nov 20 16:58:01 home-desk kernel: ata1: EH complete > Nov 20 16:58:02 home-desk kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x0 > Nov 20 16:58:02 home-desk kernel: ata1.00: (BMDMA stat 0x0) > Nov 20 16:58:02 home-desk kernel: ata1.00: tag 0 cmd 0x25 Emask 0x9 stat > 0x51 err 0x40 (media error) > Nov 20 16:58:02 home-desk kernel: ata1: EH complete > Nov 20 16:58:03 home-desk kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x0 > Nov 20 16:58:03 home-desk kernel: ata1.00: (BMDMA stat 0x0) > Nov 20 16:58:03 home-desk kernel: ata1.00: tag 0 cmd 0x25 Emask 0x9 stat > 0x51 err 0x40 (media error) > Nov 20 16:58:03 home-desk kernel: ata1: EH complete > Nov 20 16:58:04 home-desk kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x0 > ... > Nov 20 16:59:01 home-desk kernel: sd 0:0:0:0: SCSI error: return code = > 0x08000002 > Nov 20 16:59:01 home-desk kernel: sda: Current: sense key: Medium Error > Nov 20 16:59:01 home-desk kernel: Additional sense: Unrecovered read > error - auto reallocate failed > Nov 20 16:59:01 home-desk kernel: end_request: I/O error, dev sda, > sector 307380301 > Nov 20 16:59:01 home-desk kernel: ata1: EH complete > Nov 20 16:59:01 home-desk kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x0 > Nov 20 16:59:01 home-desk kernel: ata1.00: (BMDMA stat 0x0) > Nov 20 16:59:01 home-desk kernel: ata1.00: tag 0 cmd 0x25 Emask 0x9 stat > 0x51 err 0x40 (media error) > Nov 20 16:59:01 home-desk kernel: ata1: EH complete > Nov 20 16:59:01 home-desk kernel: SCSI device sda: 312581808 512-byte > hdwr sectors (160042 MB) > ... > > This repeats over-and-over-and-over throughout my logs. How can I get > it to rebuild once and then stop? > > Sean > > And finally(if responding to my own post wasn't annoying enough!), if I rebuild md0 and md1(skipping md2 for now), then reboot the machine, the machine comes backup with all three devices as failed! I start the rebuild on /dev/md0 and /dev/md1 thusly: [root@home-desk ~]# mdadm --manage --add /dev/md0 /dev/sdb1 mdadm: re-added /dev/sdb1 [root@home-desk ~]# mdadm --manage --add /dev/md1 /dev/sdb2 mdadm: re-added /dev/sdb2 Before reboot(cat /proc/mdstat): [root@home-desk ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[0] sda1[1] 1052160 blocks [2/2] [UU] md1 : active raid1 sdb2[0] sda2[1] 4192896 blocks [2/2] [UU] md2 : active raid1 sda3[1] 151043008 blocks [2/1] [_U] All is well with md0 and md1 for now. I will work on recovering md2 later. But if I reboot, sdb1 and sdb2 disappear from my raid configuration, as if I hadn't added them somewhere? Personalities : [raid1] md0 : active raid1 sda1[1] 1052160 blocks [2/1] [_U] md1 : active raid1 sda2[1] 4192896 blocks [2/1] [_U] md2 : active raid1 sda3[1] 151043008 blocks [2/1] [_U] unused devices: <none> Any ideas on how to make this 're-add' stick? Sean