Re: Too much hard drives failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Pablo,

Wouldn't think this is Reiser/Fedora related, it could be BIOS related, Seagate Firmware related (e.g. if using RAID you may need *exact* same Seagate firmware installed).

Not sure you if you the option of turning NCQ off ?

This could be a faulty batch of disks too, I would contact Seagate and check, in my case I'm one of those guys who believes in hardware raid for production machines, so failures can be resolved by disk replacement by anyone.

How did you work the 3Ware firmware thing?
I put 3ware under extreme pressure to produce a fixed (patch) that allowed my to revert their firmware to a previous version (not normally possible) and that fixed it. 3ware recently re-wrote there firmware code base which caused me lots of problems.


Albert.


Pablo Povarchik wrote:
	Albert, thanks a lot for your answer.

The only log i can trace back now is Device: /dev/sda, ATA error count increased from 11605 to 11610 because it's stored on our RequesTracker
I have also some notes about
ATA: abnormal status 0xD0 on port 0xE407

Jan 26 06:12:29 ns kernel: ata2: command 0x25 timeout, stat 0xd0
host_stat 0x21
Jan 26 06:12:29 ns kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI
SK/ASC/ASCQ 0xb/47/00
Jan 26 06:12:29 ns kernel: ata2: status=0xd0 { Busy }
Jan 26 06:12:29 ns kernel: SCSI disk error : host 3 channel 0 id 0 lun 0
return code = 8000002
Jan 26 06:12:29 ns kernel: Current sd08:10: sns = 70  b
Jan 26 06:12:29 ns kernel: ASC=47 ASCQ= 0
Jan 26 06:12:29 ns kernel: Raw sense data:0x70 0x00 0x0b 0x00 0x00 0x00
0x00 0x0a 0x00 0x00 0x00 0x00 0x47 0x00 0x00 0x00 0x00 0x00
Jan 26 06:12:29 ns kernel:  I/O error: dev 08:10, sector 0
Jan 26 06:12:29 ns kernel: ATA: abnormal status 0xD0 on port 0xE407

This is taken from of those broken disks that's still attached on the
second port of one of the servers, i left it only to try figuring this
out.

I tried replacing the cables, etc. But the disks are really broken.
replacing them it works, maybe with no further errors.

Yes, i can remember about different error messages.

And i can only wait for it to happen again, if you need more logs

How did you work the 3Ware firmware thing?

Thanks a lot for the help

Pablo



On Fri, 2007-01-26 at 04:57 +0000, Albert Graham wrote:
Hi Pablo,

What kind of failures are these ? hardware or  disk corruption/software ?

I have about 40 SM servers (with SATA2, SG 500GB etc. running FC5) also using Reiser 3, I also had failures which I eventually traced to 3ware controller firmware, however I have not had any hardware failures.


Thanks.
Albert.


Pablo Povarchik wrote:
Hello there

Im starting here because i really don't know which would the best place
to look for help. If this is not the correct list, please advise. And if
you can recommend any ML right for this, please let me know.

Words said, let's go to the point:

We have recently added 20 servers to our little farm, 7 of which were
having hard failuers on disks (SATA, Seagate, good brand new SuperMicro
boxes)

The fact is that this failures are coming up right after we decided to
move to reiserfs.

Can 7 out of 20 hard drives be defective (yes, of course, but what is
the % probability for this)?
Can this anyhow be related with reiserfs?

Sata2
Seagate
SuperMicro Fedora core 5

Any help will be more than appreciated


Thanks a lot



[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux