Re: New filesystem for Linux



Mikulas Patocka wrote:

Hi,

On Thu, 2 Nov 2006, Mikulas Patocka wrote:
As my PhD thesis, I am designing and writing a filesystem, and it'snow in a state that it can be released. You can download it fromhttp://artax.karlin.mff.cuni.cz/~mikulas/spadfs/
"Disk that can atomically write one sector (512 bytes) so that thesector
contains either old or new content in case of crash."
Well, maybe I am completly wrong but as far as I understand no diskcurrently will provide such requirement. Disks can have (after haltedwrite):
- old data,
- new data,
- nothing (unreadable sector - result of not full write and diskinternal checksum failute for that sector, happens especially oftenif you have frequent power outages).
And possibly some broken drives may also return you something thatthey think is good data but really is not (shouldn't happen sinceboth disks and cables should be protected by checksums, but hey...you can never be absolutely sure especially on very big storages).
So... isn't this making your filesystem a little flawed in design?
There was discussion about it here some times ago, and I think theresult was that the IDE bus is reset prior to capacitors discharge andtotal loss of power and disk has enough time to finish a sector ---but if you have crap power supply (doesn't signal power loss), crapmotherboard (doesn't reset bus) or crap disk (doesn't respond toreset), it can fail.

These are two examples of very different classes of storage devices - ifyou use a high end array (like EMC Clariion/Symm, IBM Shark, Hitachi,NetApp Block, etc) once the target device acknowledges the writetransaction, you have a hard promise that the data is going to persistafter a power outage, etc.

If you are using a a commodity disk, then you really have to worry abouthow the drive's write cache will handle your IO. These disks will ackthe write once they have stored the write request in their volatilememory which can be lost on power outages.

That is a reasonable setting for most end users (high performance, fewpower outages and some risk of data loss), but when data integrity is ahard requirement, people typically run with the write cache disabled.

The "write barrier" support that is in reiserfs, ext3 and xfs allprovide something that is somewhere in the middle - good performance andcache flushes injected on transaction commits or application levelfsync() commands.

I would not depend on the IDE bus reset or draining capacitors to safelydestage data - in fact, I know that it will routinely fail when we testthe write barrier on/off over power outages.

Modern S-ATA/ATA drives have 16MB or more of data in write cache andthere is a lot of data to destage in those last few ms ;-)

BTW. reiserfs and xfs depend on this feature too. ext3 is the only onethat doesn't.
Mikulas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- New filesystem for Linux
  - From: Mikulas Patocka <[email protected]>
- Re: New filesystem for Linux
  - From: Grzegorz Kulewski <[email protected]>
- Re: New filesystem for Linux
  - From: Mikulas Patocka <[email protected]>

Prev by Date: Re: __alloc_pages() failures reported due to fragmentation
Next by Date: [PATCH] pdc202xx_old: Fix name clashes with PA-RISC
Previous by thread: Re: New filesystem for Linux
Next by thread: Re: New filesystem for Linux
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]