Re: Scsi errors with Megaraid 300-8x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johan Groth wrote:
Hi,
ever since I upgraded my server from a dual Opteron 244 (mobo Tyan 2885) system to a dual dual-core Opteron 285 (mobo Tyan 2895) system, I'm getting read errors that freezes the system which leads to my disk based backup software stopped working (faubackup). I think it is faubackup that triggers the bug.

I get these errors in the log:
Aug 20 06:35:08 jaguar kernel: sd 2:1:0:0: SCSI error: return code = 0x40001 Aug 20 06:35:56 jaguar kernel: end_request: I/O error, dev sda, sector 616924530 Aug 20 06:36:03 jaguar kernel: sd 2:1:0:0: SCSI error: return code = 0x40001 Aug 20 06:36:03 jaguar kernel: end_request: I/O error, dev sda, sector 616924538
..
Aug 20 06:36:07 jaguar kernel: sd 2:1:0:0: SCSI error: return code = 0x40001 Aug 20 06:36:07 jaguar kernel: end_request: I/O error, dev sda, sector 616924538

The last sector is repeated until I reboot the machine. The only difference I've made to the raid configuration is that sdc is now 2x250 MB instead of 4x120MB, but that array is the target not the source (sda).
The raid HW is an LSI Megaraid 300-8x with the following configuration:
..

That looks like the classic SCSI bad-sectory non-recovery bug.
The code in scsi_lib.c, scsi_error.c, and sd.c is currently a
bit of a mess here.
Basically, given an I/O request for 200 sectors, with a bad sector
in the middle at number 100, what SCSI will often do is fail sectors
number 1 through 100, one at a time, retrying the entire remainder of
the request after each attempt.  This takes hours, and results in no
data for the first 99 good sectors.

What it needs to do *instead*, is retry each sector individually,
rather than the entire request.  This would result in sectors 1..99
and 101..200 succeeding, and retries/failure only for sector 100.

A slight optimization would be to fail the bio size around sector 100,
rather than just the one sector.

I've got patches that do exactly this, and they work quite well.
But they're probably not "pretty enough" for inclusion.

Cheers


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux