On Thu, 2006-02-02 at 10:13 +0000, Terry Barnaby wrote: > Terry Barnaby wrote: > > Gilboa Davara wrote: > > > >> On Wed, 2006-02-01 at 12:01 +0000, Terry Barnaby wrote: > >> > >>> Gilboa Davara wrote: > >>> > >>>> By default software RAID1/5/6 support on-line drive > >>>> kill/remove/rebuild/etc. > >>>> However, seems that the MD driver is unaware of the dead drive. > >>>> > >>>> What does /proc/mdstat say? > >>>> > >>>> Gilboa > >>>> > >>>> > >>> > >>> After removing the SATA cable on /dev/sdd, if I access a file there > >>> is a long delay > >>> and then the program returns with no error but no data. For example: > >>> "cat /data/test-file" will delay and then exit with status of "0" but > >>> no file > >>> contents are displayed. > >>> > >>> The kernel is: 2.6.14-1.1656_FC4smp: I get the following kernel > >>> messages: > >>> > >>> Feb 1 11:51:37 library kernel: ata2: command 0x35 timeout, stat 0x0 > >>> host_stat 0x61 > >>> Feb 1 11:51:38 library sshd(pam_unix)[13027]: session opened for > >>> user root by root(uid=0) > >>> Feb 1 11:52:07 library kernel: ata2: command 0x25 timeout, stat 0x0 > >>> host_stat 0x61 > >>> Feb 1 11:53:07 library last message repeated 2 times > >>> Feb 1 11:54:37 library last message repeated 3 times > >>> Feb 1 11:55:01 library crond(pam_unix)[13091]: session opened for > >>> user root by (uid=0) > >>> Feb 1 11:55:01 library crond(pam_unix)[13091]: session closed for > >>> user root > >>> Feb 1 11:55:07 library kernel: ata2: command 0x25 timeout, stat 0x0 > >>> host_stat 0x61 > >>> > >>> /proc/mdstat has: > >>> Personalities : [raid1] [raid5] > >>> md1 : active raid1 sdc1[0] > >>> 20482752 blocks [2/1] [U_] > >>> > >>> md2 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0] > >>> 873196800 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] > >>> > >>> md0 : active raid1 sdb1[1] sda1[0] > >>> 20482752 blocks [2/2] [UU] > >>> > >>> unused devices: <none> > >>> > >>> The output of "mdadm -Q --detail /dev/md2" is: > >>> /dev/md2: > >>> Version : 00.90.02 > >>> Creation Time : Tue Jan 31 14:14:07 2006 > >>> Raid Level : raid5 > >>> Array Size : 873196800 (832.75 GiB 894.15 GB) > >>> Device Size : 291065600 (277.58 GiB 298.05 GB) > >>> Raid Devices : 4 > >>> Total Devices : 4 > >>> Preferred Minor : 2 > >>> Persistence : Superblock is persistent > >>> > >>> Update Time : Wed Feb 1 11:51:07 2006 > >>> State : active > >>> Active Devices : 4 > >>> Working Devices : 4 > >>> Failed Devices : 0 > >>> Spare Devices : 0 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 64K > >>> > >>> UUID : 56bd5037:9d9b9018:eb8f01d6:94155776 > >>> Events : 0.230 > >>> > >>> Number Major Minor RaidDevice State > >>> 0 8 3 0 active sync /dev/sda3 > >>> 1 8 19 1 active sync /dev/sdb3 > >>> 2 8 35 2 active sync /dev/sdc3 > >>> 3 8 51 3 active sync /dev/sdd3 > >>> > >>> Terry > >> > >> > >> > >> Very weird. > >> I've got a number of both IDE, SATA and SCSI RAID5 setups and I never > >> seen such a problem. > >> What happens if you try to access the RAID5 array? > >> (hdparm -tT /dev/md2) > >> > >> Gilboa > >> > > > > I hav'nt tried "hdparm -tT /dev/md2", but if I access a file there is a > > long delay > > and then the program returns with no error but no data. For example: > > "cat /data/test-file" will delay and then exit with status of "0" but no > > file > > contents are displayed. > > > > This is VERY VERY BAD ! > > > > I really think this must be a bug, possibly in the SATA driver in Fedora > > Core 4's > > 2.6.14-1.1656_FC4smp kernel. I have a spare SCSI system with 3 SCSI disks, > > I will set that up and see how this handles the situation ... > > > > Terry > > > I have just set up a SCSI raid array and tried unplugging a drive. > All works as expected here, ie there are error messages from the raid > system and an email to root and the system continues running fine. > > So it looks like a bug in the SATA driver .... > > Terry I'd suggest you file a bug report against both bugzilla.kernel.org and bugzilla.redhat.com Gilboa