Re: O_DIRECT question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 13 Jan 2007, Bill Davidsen wrote:

> Bodo Eggert wrote:
> 
> > (*) This would allow fadvise_size(), too, which could reduce fragmentation
> >     (and give an early warning on full disks) without forcing e.g. fat to
> >     zero all blocks. OTOH, fadvise_size() would allow users to reserve the
> >     complete disk space without his filesizes reflecting this.
> 
> Please clarify how this would interact with quota, and why it wouldn't 
> allow someone to run me out of disk.

I fell into the "write-will-never-fail"-pit. Therefore I have to talk 
about the original purpose, write with O_DIRECT, too.

- Reserved blocks should be taken out of the quota, since they are about
  to be written right now. If you emptied your quota doing this, it's
  your fault. It it was the group's quota, just run fast enough.-)

- If one write failed that extended the reserved range, the reserved area 
  should be shrunk again. Obviously you'll need something clever here.
  * You can't shrink carelessly while there are O_DIRECT writes.
  * You can't just try to grab the semaphore[0] for writing, this will 
    deadlock with other write()s.
  * If you drop the read lock, it will work out, because you aren't
    writing anymore, and if you get the write lock, there won't be anybody 
    else writing. Therefore you can clear the reservation for the not-
    written blocks. You may unreserve blocks that should stay reserved,
    but that won't harm much. At worst, you'll get fragmentation, loss
    of speed and an aborted (because of no free space) write command.
    Document this, it's a feature.-)

- If you fadvise_size on a non-quota-disk, you can possibly reserve it 
  completely, without being the easy-to-spot offender. You can do the
  same by actually writing these files, keeping them open and unlinking 
  them. The new quality is: You can't just look at the file sizes in
  /proc in order to spot the offender. However, if you reflect the 
  reserved blocks in the used-blocks-field of struct stat, du will
  work as expected and the BOFH will know whom to LART.

  BTW: If the fs supports holes, using du would be the right thing
  to do anyway.


BTW2: I don't know if reserving without actually assigning blocks is 
supported or easy to support at all. These reservations are the result of 
"These blocks are not yet written, therefore they contain possibly secret 
data that would leak on failed writes, therefore they may not be actually 
assigned to the file before write finishes. They may not be on the free 
list either. And hey, if we support pre-reserving blocks to the file, we 
may additionally use it for fadvise_size. I'll mention that briefly."




[0] r/w semaphore, read={r,w}_odirect, write=ftruncate

-- 
Fun things to slip into your budget
Paradigm pro-activator (a whole pack)
	(you mean beer?)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux