2.6.12-rc2: Promise SATA150 TX4 failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

just tried 2.6.12-rc2 and I still have the same errors from my SATA
disks as with 2.6.11.  The setup is a bit complex.  The relevant parts (I
think) are:

Adaptec AHA-2940UW SCSI-controller, attached are:
1 harddisk /dev/sda
1 DDS3 streamer /dev/st0

Promise SATA150 TX4 controller, attached are:
2 identical hardisks /dev/sdb and /dev/sdc

/dev/sda consists of the root partition, a swap partition and 4 other
partitions that are physical volumes for dm volume group /dev/vg1

/dev/sdb and /dev/sdc have two partitions each, the first of both make
a RAID-0 array /dev/md0 and the second of both make a md RAID-1 array
/dev/md1

/dev/md0 and /dev/md1 are the physical volumes for dm volume groups
/dev/vg2 and /dev/vg3 resp.

To trigger the failure:
- For all logical volumes in /dev/vg1, /dev/vg2 and /dev/vg3 a snapshot is
  created.
- All snapshots are mounted read-only in a "snapshot hierarchy" under
  /snap.
- A backup to tape is taken using something like:
  find /snap -print | cpio -oaH crc -F /dev/st0
  Backup must go to tape, no problem with /dev/null
- At this point, some additional i/o on the SATA disks cause the whole
  box to hang. Mostly some errors are written to syslog, they are
  always similar:
Apr  9 01:30:35 bear kernel: ata2: status=0x51 { DriveReady SeekComplete Error }Apr  9 01:30:35 bear kernel: ata2: called with no error (51)!
Apr  9 01:30:35 bear kernel: SCSI error : <2 0 0 0> return code = 0x8000002
Apr  9 01:30:35 bear kernel: sdc: Current: sense key: Medium Error
Apr  9 01:30:35 bear kernel:     Additional sense: Unrecovered read error - auto reallocate failed
Apr  9 01:30:35 bear kernel: end_request: I/O error, dev sdc, sector 43100350
Apr  9 01:30:35 bear kernel: raid1: Disk failure on sdc2, disabling device.
Apr  9 01:30:35 bear kernel: ^IOperation continuing on 1 devices

The errors are always reported for /dev/sdc2, the second device of a
RAID-1 array.  After reboot I am able to raidhotadd the failed partition
without problems.

The problem is 100% reproducible.

The hang is not a "hard hang": X keeps running, the watchdog doesn't hit
but no new processes can be started.  Syslog entries stop after some time
(from a few seconds to several minutes).

The problem appeared somewhere between 2.6.10 and 2.6.11.
2.6.10:		ok
2.6.10-ac8:	ok
2.6.10-ac11:	failed
2.6.11:		failed
2.6.12-rc2:	failed

I'd be glad if there would be a solution for this problem as it prevents
me from using any newer kernel.

-jo

-- 
-rw-r--r--  1 jo users 63 2005-04-09 09:31 /home/jo/.signature
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux