Data corruption on SiI 3114?

Greetings all.The short version first: I'm having problems with data corruption on asoftware raid5 partition that is using 4 SATA drives hanging off of an addonSiI 3114 card. This has been going on for a couple months now, with mythinking some action has fixed it and having to wait to see that it doesn't.I recently started using iozone (source fromhttp://www.iozone.org/src/current/iozone3_263.tar) which generally triggersit fairly quickly:

Test #1:
./iozone -R -g 4G -a -+d > ~/iozone.report
(blahblahblah)

524288 8192 75582 88670 141979 142247 141863 116981142000 135012 142464 69620 77813 142197 142480524288 16384 81263 93395 142279 142543 142399 114740142307 135391 141962 70295 92522 141945 1420901048576 64 81280 88546

Error in file: Position 0 0 0
Error in file: Position 93847552
Record # 1432 Record size 64 kb
(dropped the Char line since it has high ASCII)

Found pattern: Hex >>ffffffff<< Expecting >>fffffffb<<

Test #2:

262144 8192 64311 110685 136845 126089 125882 69296137398 101758 138808 68244 73281 137469 138596262144 16384 73250 87237 137979 138027 127386 69802130037 65369 133270 74445 90564 123972 102779524288 64 74796 142936

Error in file: Position 1664 0 0
Error in file: Position 473616384
Record # 7226 Record size 64 kb
(dropped the Char line since it has high ASCII)

Found pattern: Hex >>ffffffff<< Expecting >>fffffffb<<

Other tests I've done:
memtest86 and mprime both run for a couple days without showing problems.

iozone running on other partitions does not error.I'm trying to troubleshoot to see what portion of hardware/software isflakey, but having a difficult time doing so. This same server has a pairof parallel ATA drives hanging off the motherboard, running software raid1,that do not expose the problem. This would seem to eliminate everything notdirectly associated with the raid5 setup, and leaves the raid5 driver, thesata_sil driver, the SATA card itself, drive cabling, or the drives. Butthe raid5 driver should catch errors from the sata_sil driver on down. Thisleaves either a memory/CPU problem (which memtest86 or mprime didn't find)or a bug in raid5 (which I can't believe, as commonplace as it is).Any suggestions, what-have-you to troubleshoot this is appreciated. My keyproblem is I can't really afford to lose the data on the raid5 partition -I've backed up all the absolutely critical things, but I just don't have thebackup capacity for it all, and would rather not lose it.

System details:

Motherboard: Tyan Tiger MPX (S2466N), with 2 AMD Athlon MP 2000+ processors,and 1 gig of RAMKernel: A number of different kernels, ranging from Debian-packaged 2.6.8-1,grsec 2.6.14-1, up through the currently-installed 2.6.17.1 downloaded fromkernels.org.

Drive configuations:

SiI 3114 card using sata_sil driver, with 4 ST3300831AS drives connected.These 4 drives are combined using the Linux raid5 driver to make a single826GiB partition, mounted as /home.Onboard IDE with 2 ye-ol generic 40G drives. 5 seperate raid1 instances,providing /, /tmp, /usr, /var, and /chroot.All partitions are using ext3.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Prev by Date: Re: tty_mutex and tty_old_pgrp
Next by Date: Re: [Patch] Off by one in drivers/usb/input/yealink.c
Previous by thread: [Patch] Off by one in drivers/usb/input/yealink.c
Next by thread: [PATCH 0/2] srcu-2: add RCU variant that permits read-side blocking
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]