Hello, I have a server with 5 serial ata disks, 4 of them connected into 2 software raid1 devices. Today this server stopped responding (no ping, nothing on the screen, even numlock not working) and after inspecting logs I found 5 records like: Mar 9 19:30:00 shaman ata5: status=0x51 { DriveReady SeekComplete Error } Mar 9 19:30:00 shaman ata5: error=0x0c { DriveStatusError } (not consequent) before the freeze. First one was at 19:03 - about half an hour before the freeze. I'm pretty sure, that the reason for server stopping responding is hard drive failure. So the question is, isn't raid supposed to kick the device out of the array in case of io error? Surely I can write a script that monitors the logs and kicks drives out, but this does not sound like a good solution. The drive was still in the array after the reboot and after the reboot it continued to issue such errors until I removed the drive from array with mdadm -f. I'm attaching dmesg of the machine after reboot. Anton Titov Host.bg
Attachment:
dmesg.shaman.gz
Description: GNU Zip compressed data
- Prev by Date: Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
- Next by Date: Re: [PATCH] reduce syslog clutter (take 2)
- Previous by thread: [RFC PATCH] ext3 writepage() journal avoidance
- Next by thread: [PATCH] Document Linux's memory barriers [try #4]
- Index(es):