Software raid not kicking devices out of the array

Hello,

I have a server with 5 serial ata disks, 4 of them connected into 2
software raid1 devices. Today this server stopped responding (no ping,
nothing on the screen, even numlock not working) and after inspecting
logs I found 5 records like:

Mar  9 19:30:00 shaman ata5: status=0x51 { DriveReady SeekComplete
Error }
Mar  9 19:30:00 shaman ata5: error=0x0c { DriveStatusError }

(not consequent) before the freeze. First one was at 19:03 - about half
an hour before the freeze. I'm pretty sure, that the reason for server
stopping responding is hard drive failure.

So the question is, isn't raid supposed to kick the device out of the
array in case of io error? Surely I can write a script that monitors the
logs and kicks drives out, but this does not sound like a good solution.

The drive was still in the array after the reboot and after the reboot
it continued to issue such errors until I removed the drive from array
with mdadm -f.

I'm attaching dmesg of the machine after reboot.

Anton Titov
Host.bg

Attachment: dmesg.shaman.gz
Description: GNU Zip compressed data

Prev by Date: Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
Next by Date: Re: [PATCH] reduce syslog clutter (take 2)
Previous by thread: [RFC PATCH] ext3 writepage() journal avoidance
Next by thread: [PATCH] Document Linux's memory barriers [try #4]
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]