Terry Barnaby wrote:
Gilboa Davara wrote:
On Wed, 2006-02-01 at 12:01 +0000, Terry Barnaby wrote:
Gilboa Davara wrote:
By default software RAID1/5/6 support on-line drive
kill/remove/rebuild/etc.
However, seems that the MD driver is unaware of the dead drive.
What does /proc/mdstat say?
Gilboa
After removing the SATA cable on /dev/sdd, if I access a file there
is a long delay
and then the program returns with no error but no data. For example:
"cat /data/test-file" will delay and then exit with status of "0" but
no file
contents are displayed.
The kernel is: 2.6.14-1.1656_FC4smp: I get the following kernel
messages:
Feb 1 11:51:37 library kernel: ata2: command 0x35 timeout, stat 0x0
host_stat 0x61
Feb 1 11:51:38 library sshd(pam_unix)[13027]: session opened for
user root by root(uid=0)
Feb 1 11:52:07 library kernel: ata2: command 0x25 timeout, stat 0x0
host_stat 0x61
Feb 1 11:53:07 library last message repeated 2 times
Feb 1 11:54:37 library last message repeated 3 times
Feb 1 11:55:01 library crond(pam_unix)[13091]: session opened for
user root by (uid=0)
Feb 1 11:55:01 library crond(pam_unix)[13091]: session closed for
user root
Feb 1 11:55:07 library kernel: ata2: command 0x25 timeout, stat 0x0
host_stat 0x61
/proc/mdstat has:
Personalities : [raid1] [raid5]
md1 : active raid1 sdc1[0]
20482752 blocks [2/1] [U_]
md2 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
873196800 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
md0 : active raid1 sdb1[1] sda1[0]
20482752 blocks [2/2] [UU]
unused devices: <none>
The output of "mdadm -Q --detail /dev/md2" is:
/dev/md2:
Version : 00.90.02
Creation Time : Tue Jan 31 14:14:07 2006
Raid Level : raid5
Array Size : 873196800 (832.75 GiB 894.15 GB)
Device Size : 291065600 (277.58 GiB 298.05 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Wed Feb 1 11:51:07 2006
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 56bd5037:9d9b9018:eb8f01d6:94155776
Events : 0.230
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
Terry
Very weird.
I've got a number of both IDE, SATA and SCSI RAID5 setups and I never
seen such a problem.
What happens if you try to access the RAID5 array?
(hdparm -tT /dev/md2)
Gilboa
I hav'nt tried "hdparm -tT /dev/md2", but if I access a file there is a
long delay
and then the program returns with no error but no data. For example:
"cat /data/test-file" will delay and then exit with status of "0" but no
file
contents are displayed.
This is VERY VERY BAD !
I really think this must be a bug, possibly in the SATA driver in Fedora
Core 4's
2.6.14-1.1656_FC4smp kernel. I have a spare SCSI system with 3 SCSI disks,
I will set that up and see how this handles the situation ...
Terry
I have just set up a SCSI raid array and tried unplugging a drive.
All works as expected here, ie there are error messages from the raid
system and an email to root and the system continues running fine.
So it looks like a bug in the SATA driver ....
Terry