Re: RAID 10 w AHCI w NCQ = Spurius I/O error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nestor A. Diaz wrote:
Hello People,

I need your help, this problem is turning me crazy.

Did you know there is a raid list?

I have created a RAID 10 using a RAID0 configuration on top of a two RAID1 devices (all software raid), like this:

You have created a raid 0+1, raid10 is a different thing. Given your setup, raid10 is probably what you *should* have created.

Personalities : [raid0] [raid1]
md4 : active raid0 md2[0] md3[1]
     605071872 blocks 64k chunks

md0 : active raid1 sdd3[3] sda3[0] sdc3[2] sdb3[1]
     9791552 blocks [4/4] [UUUU]

md3 : active raid1 sdd2[2](F) sdb2[0]
     302536000 blocks [2/1] [U_]

md1 : active raid1 sdd1[3] sda1[0] sdc1[2] sdb1[1]
     240832 blocks [4/4] [UUUU]

md2 : active raid1 sda2[0] sdc2[1]
     302536000 blocks [2/2] [UU]

unused devices: <none>

But the sdd device sometimes fail, i have changed the hard disk, check the older sata drive, reformat using mke2fs -c -c (to check for media errrors both read and write, no media problems found, change the sata disk and the problem remains, also with a new sata hard disk).

The systema is a supermicro server 5015-mt+ with an ich7 ahci controller

[___snip___]

The RAID 1 builds perfectly, but five days after that, the system shows a:

end_request: I/O error, dev sdd, sector 144006110
raid1: Disk failure on sdd2, disabling device.
Operation continuing on 1 devices
end_request: I/O error, dev sdd, sector 144006222
end_request: I/O error, dev sdd, sector 144268814
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sdb2
disk 1, wo:1, o:0, dev:sdd2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sdb2

Hardware error, almost certainly. If you're using a hub, I suspect that first, then cables and heat problems, then the controller, in rough order of likelyhood.

a week before i get (under 2.6.18) the following message:

[___lots more snip___]


I have updated from 2.6.18 to 2.6.22 expecting to not have the problem, but the problem remains and i didn't know what could be the problem, the problem always happen on /dev/sdd, i use LVM on top of the RAID 10 software device.

I am not sure if the problem was because i create the RAID10 using two RAID1 devices and then do a RAID0, or should i have to be used mdadm and the level 10 option ?

Any suggestions will be welcome.

Do you ever get errors in partitions which are not part of the raid0+1 setup, like md1? If not, look at your partition tables to see if you have any strange values there.

Are all drives at the same firmware level?

--
Bill Davidsen <[email protected]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux