Re: If not readdir() then what?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday April 12, [email protected] wrote:
> On Thu, 12 April 2007 11:46:41 +1000, Neil Brown wrote:
> > 
> > I could argue that nfs came before ext3+dirindex, so ext3 should have
> > been designed to work properly with NFS.  You could argue that fixing
> > it in nfsd fixes it for all filesystems.  But I'm not sure either of
> > those arguments are likely to be at all convincing...
> 
> Caring about a non-ext3 filesystem, I sure would like an nfs solution as
> well. :)

I have a non-ext3 filesystem I care about too.....

But my perspective is that a solution in nfsd at-best a work-around.
Caching the whole 'struct file' when there is just a small bit that we
might want seems like a heavy hammer.  The filesystem is in the best
place to know what needs to be cached, and it should be the one doing
the caching.

> 
> > Hmmm. I wonder.  Which is more likely?
> >   - That two 64bit hashes from some set are the same
> >   - or that 65536 48bit hashes from a set of equal size are the same.
> 
> The former.  Each bit going from hash strength to collision chain length
> reduces the likelihood of an overflow.  In the extreme case of a 0bit
> hash and 64bit collision chain, you need 2^64 entries compared to 2^32
> for the other extreme.
> 
> However, the collision chain gives me quite a bit of headache.  One
> would have to store each entry's position on the chain, deal with older
> entries getting deleted, newer entries getting removed, etc.  All this
> requires a lot of complicated code that basically never gets tested in
> the wild.

This is a simple consequence of the design decision to use hashes as
the search key.  They aren't dense and they will collide.  So the
solution will be a bit fuzzy around the edges.  And maybe that is an
acceptable tradeoff.  But the filesystem should take full
responsibility for it, whether in performance or correctness :-)

> 
> Just settling for a 64bit hash and returning -EEXIST when someone causes
> a collision an creat() sounds more appealing.  Directories with 4
> billion entries will cause problems, but that is hardly news to anyone.
> 

I think you want -EFBIG or -ENOSPC.  -EEXIST sounds just wrong.

But there are alternatives.  e.g. internal chaining.
Insist on a unique 64bit hash for every file.  If the hash is in use,
increment and try again.  On lookup, if the hash leads you to a file
with the wrong name, increment and try again until you find a hole
(hash value that is not stored).  When you delete an entry, leave a
place holder if the next hash is in use.  Conversely if the next hash
is not in use, delete the entry and delete the previous one if it is a
place holder.

Then you get 100% correct semantics and a performance hit in the face
of hash collisions that is probably no worse than that which ext3
currently gets.  It probably does cost you a bit of storage to store
those 64bit hashes, though I suspect some clever compression can help
out there (You only need one bit more than the filename when there is
no chaining).

You have to require 64bit cookies/fpos, but I think that today, that
is a reasonable thing to require (5 years ago it might not have been).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux