Re: If not readdir() then what?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday April 11, [email protected] wrote:
> On Wed, Apr 11, 2007 at 07:15:57AM +1000, Neil Brown wrote:
> > > Neil, is this correct?
> > 
> > It's just NFS.  nfsd does open/getdents/close on every readdir
> > request.
> 
> That's... unfortunate.

I like to think of it as "challenging" :-)

> 
> 
> So the problem with nfsd doing an open/getdents/close on every readdir
> request is two-fold.  First of all, it means that you are incurring
> extra overhead by forcing ext3 to allocate, create, and free the
> red-black tree on every single NFS readdir request.  Secondly, it
> increases the chances of problems with the hash collision logic.

I guess we need a two-fold solution then  :-)

For the first:
  You are storing an internal tree representation of part of the
  directory attached to the 'struct file'.
  Would it be possible to store this attached to the page via the
  ->private pointer?  What would avoid the allocate/create/free
  overhead on every request.

  You suggest caching the open files in nfsd.  While that is probably
  possible (I've thought of it a number of times) is would also be
  quite complex, e.g. requiring some sort of call-back to close all
  those files when the filesystem is unexported.  And it is very easy
  to get caching heuristics wrong.  Leveraging the page-cache which is
  a very mature cache seems to make a lot of sense.

  The cursor would still be in the 'struct file', but if the second
  second fold works, that should be much less critical.

For the second.
  You say that you " would need at least 96 bits in order to make that
  guarantee; 64 bits of hash, plus a 32-bit count value in the hash
  collision chain".  I think 96 is a bit greedy.  Surely 48 bits of 
  hash and 16 bits of collision-chain-position would plenty.  You would
  need 65537 entries before a collision was even possible, and
  billions before it was at all likely. (How big does a set of 48bit
  numbers have to get before the probability that "No subset of 65536
  numbers are all the same" drops below 0.95?)

  This would really require that the collision-chain-index was stable
  across create/delete.  Doing that while you have the tree in the
  page cache is probably easy enough.  Doing it across reboots is
  probably not possible without on-disk changes.

So while my two-fold solution is not perfect(*), I think it is
substantially better than what we have now, and is extremely unlike to
ever show an observable problem.
(*)And perfection can only be achieved we two independent indexes.


This requires 64bit cookies which NFSv2 cannot handle.  I think it is
OK to say "tough".  We should probably make sure that READDIRv2 *always*
fails on a directory which is indexed, and possibly on a filesystem
which allows index directories.  People can always disable indexing if
they want a v2 export.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux