Re: Fedora14: Strange and intermittent very slow disks on server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/18/2010 04:02 PM, Lamar Owen wrote:
> On Saturday, December 18, 2010 03:08:45 am Terry Barnaby wrote:
>> It is strange, however, how the system can run perfectly fine with good
>> fast disk IO for a while and then go into this slow mode. In the slow
>> mode a command can take 30seconds or more to run on an unloaded system.
>> It smacks of some Linux kernel SATA driver/RAID1 versus WD EARS drive
>> interaction to me.
>
> It's definitely something; the TLER discussions I've seen are just partial explanations at best.
>
>> However, I think I will change the drives. I was hoping to try some WD10EADS
>> ones I have, but after your issues I will look at the RE series or
>> another make ...
>
> The RE series is WD's 'RAID Enterprise' or 'RAID Enabled' (depending on how you look at it) drives, and cost more.  They should work fine in RAID.  The lower cost WD drives have been giving problems in RAID, and not just on Linux.  WD even says they are not designed for RAID.
>
> Please see the responses at:
> http://community.wdc.com/t5/Other-Internal-Drives/1-TB-WD10EARS-desynch-issues-in-RAID/m-p/11559
> Also see:
> http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1397
>
> That last link is to WD's FAQ; it explains the root cause of the issue, that of deep cycle recovery (saying point blank that the drive could take *2* *minutes* to recover *one* *sector* in error).  So basically any time the drive hits an error, things slow to a crawl as the iowaits pile up.  This is the info iostat -x 1 will give you; watch the await time (given in milliseconds); I saw awaits of up to 20,000 ms while trying to use my WD15EADS drive in RAID1.

Yes, I have used the WD RE drives in RAID servers for a number of years, with no 
issues. The system in question is now running on one of these awaiting a second.

All of the explanations, TLER included make sense and point against RAID use,
but I don't think are causing the issues I am seeing at the moment. I'm sure 
they would hit at some time into the future though.

The WD10EARS drives I am using are new and the SMART reports indicate no bad 
sectors so no error recovery should be going on at the moment. Also there
is no problem with drives being kicked out of the array etc.
The disk system actually can work well (60MBytes/s write, 95Mbytes/s read).
Its just that the system goes into very slow mode occasionally. Also
all disks are affected on the system, not just the two raided WD10EARS
ones, a third WD20EARS also goes slow when the systems gets into the fault
mode.

I noted it start to happen once. I have a number or separately RAID'ed 
partitions on these disks. The rootfs is at the start and there is a
video storage (MythTv) near the end. The system went slow when MythTv was
recording video at about 2.5 MBytes/s.

I strongly feel there is a Linux kernel/disk interaction going on here.
Maybe something like:
Linux driver orders block requests based on head position
blocking some requests until a reasonable number has been completed.
I suspect the drive also orders requests by head position with a similar 
algorithm, these drives have a 64M buffer. Perhaps the two systems interact
with one another and this results in huge delays for particular block 
read/writes although most are done quickly. iostat does not show to long average 
wait times (await) times when things are slow (~1000ms).

If I had the time I would investigate further, but the system is fine with
a different disk at the moment.

-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux