Re: Disk write cache — Linux Kernel

>>>>> "Jeff" == Jeff Garzik <[email protected]> writes:

Jeff> Kenichi Okuyama wrote:
>>>>>>> "Jeff" == Jeff Garzik <[email protected]> writes:
>> 
>> 
Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
>> 
>>>> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
>>>> 
>>>>> On Sun, 15 May 2005, Tomasz Torcz wrote:
>>>>> 
>>>>>> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
>>>>>> 
>>>>>>>>>> However they've patched the FreeBSD kernel to
>>>>>>>>>> "workaround?" it:
>>>>>>>>>> ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
>>>>>>>>>> t5.patch
>>>>>>>>> 
>>>>>>>>> That's a similar stupid idea as they did with the disk write
>>>>>>>>> cache (lowering the MTBFs of their disks by considerable
>>>>>>>>> factors, which is much worse than the power off data loss
>>>>>>>>> problem) Let's not go down this path please.
>>>>>>>> 
>>>>>>>> What wrong did they do with disk write cache?
>>>>>>> 
>>>>>>> They turned it off by default, which according to disk vendors
>>>>>>> lowers the MTBF of your disk to a fraction of the original
>>>>>>> value.
>>>>>>> 
>>>>>>> I bet the total amount of valuable data lost for FreeBSD users
>>>>>>> because of broken disks is much much bigger than what they
>>>>>>> gained from not losing in the rather hard to hit power off
>>>>>>> cases.
>>>>>> 
>>>>> Aren't I/O barriers a way to safely use write cache?
>>>>> 
>>>>> FreeBSD used these barriers (FLUSH CACHE command) long time ago.
>>>>> 
>>>>> There are rumors that some disks ignore FLUSH CACHE command just to
>>>>> get higher benchmarks in Windows. But I haven't heart of any proof.
>>>>> Does anybody know, what companies fake this command?
>>>>> 
>>>> 
>>>>> From a story I read elsewhere just a few days ago, this problem is 
>>>> virtually universal even in the umpty-bucks 15,000 rpm scsi server 
>>>> drives.  It appears that this is just another way to crank up the 
>>>> numbers and make each drive seem faster than its competition.
>>>> 
>>>> My gut feeling is that if this gets enough ink to get under the drive 
>>>> makers skins, we will see the issuance of a utility from the makers 
>>>> that will re-program the drives therefore enabling the proper 
>>>> handling of the FLUSH CACHE command.  This would be an excellent 
>>>> chance IMO, to make a bit of noise if the utility comes out, but only 
>>>> runs on windows.  In that event, we hold their feet to the fire (the 
>>>> prefereable method), or a wrapper is written that allows it to run on 
>>>> any os with a bash-like shell manager.
>> 
>> 
>> 
Jeff> There is a large amount of yammering and speculation in this thread.
>> 
Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
>> 
>> 
>> Then it must be file system who's not controlling properly.  And
>> because this is so widely spread among Linux, there must be at least
>> one bug existing in VFS ( or there was, and everyone copied it ).
>> 
>> At least, from:
>> 
>> http://developer.osdl.jp/projects/doubt/
>> 
>> there is project name "diskio" which does black box test about this:
>> 
>> http://developer.osdl.jp/projects/doubt/diskio/index.html
>> 
>> And if we assume for Read after Write access semantics of HDD for
>> "SURELY" checking the data image on disk surface ( by HDD, I mean ),
>> on both SCSI and ATA, ALL the file system does not pass the test.
>> 
>> And I was wondering who's bad. File system? Device driver of both
>> SCSI and ATA? or criterion? From Jeff's point, it seems like file
>> system or criterion...

Jeff> The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE 
Jeff> command to be generated has only been present in the most recent 2.6.x 
Jeff> kernels.  See the "write barrier" stuff that people have been discussing.

Jeff> Furthermore, read-after-write implies nothing at all.  The only way to 
Jeff> you can be assured that your data has "hit the platter" is
Jeff> (1) issuing [FLUSH|SYNC] CACHE, or
Jeff> (2) using FUA-style disk commands

Jeff> It sounds like your test (or reasoning) is invalid.

Thank you for you information, Jeff.

I didn't see the reason why my reasoning is invalid, for they are
black box test and doesn't care about implementation.

But with your explanation and some logs, I see where to look for.
I'll run test with FreeBSD as soon as I got time.
If FreeBSD fails, there must be something wrong with reasoning.

Thanks again for great hint.
regards,
---- 
Kenichi Okuyama
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  - From: Jeff Garzik <[email protected]>
- Re: Disk write cache
  - From: Kenichi Okuyama <[email protected]>
- Re: Disk write cache
  - From: Jeff Garzik <[email protected]>

Prev by Date: Re: probably NFS related Oops during shutdown with 2.6.12-rc3-mm3
Next by Date: Re: Mercurial 0.4e vs git network pull
Previous by thread: Re: Disk write cache
Next by thread: Linux does not care for data integrity (was: Disk write cache)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]