Re: Max Files Per Directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 12:30 PM -0700 7/16/05, Mike McMullen wrote:
>----- Original Message -----
>From: "Tony Nelson" <tonynelson@xxxxxxxxxxxxxxxxx>
>
>
>At 8:53 AM -0400 7/16/05, Matthew Miller wrote:
>>On Sat, Jul 16, 2005 at 08:04:22AM -0400, fredex wrote:
>>> Now, for performance reasons, it is often not a good idea ot have
>>> many thousands of files in a single directory. As the number of files
>>> grows large the time it takes to access a file grows larger. I haven't
>>> looked into this on any linux file system, but on other unixes I've
>>> observed delays reaching up into the whole-second region when many
>>>thousands
>>> of files are in a single directory.
>>
>>Shouldn't be a problem on a new install of ext3 on modern Linux. See
>><http://lwn.net/Articles/11481/>.
>
>I suppose the simple test is to try it. Time (microseconds) per file to run
>a script that creates files by touching them, to ls a few near the end, and
>to remove all the files; see script at end of transcript.
>
>#files    touch    ls       rm
>------    -----    -----    -----
>  1000      309      016      054
> 10000      331      008      051
>100000      326      007      357
>200000      330      007     1065
>
>I didn't want to try a million files.  Creation and listing seem to be
>linear with the number of files, but rm seems quadratic.  I think this
>indicates that in my current 2.6.12 FC3 kernel using Ext3 the directory
>data structure is still a list and not a tree.
>
>I expect that some sort of directory hierarchy to limit the number of files
>per directory would still be a win.
>
>Transcript follows.  Max line width 114 chars.
>
>------ End of Original Message -------
>
>Tony and Mathew thanks very much for your help!!!
>
>This confirms roughly what I thought would be the case. Tony, thanks
>especially for going to the trouble of testing it!

You're welcome.

I've had a few more thoughts on the matter; they don't change the bottom line.

>#files    touch    ls       rm
>------    -----    -----    -----
>  1000      309      016      054
> 10000      331      008      051
>100000      326      007      357
>200000      330      007     1065
>
> ...  Creation and listing seem to be
>linear with the number of files, but rm seems quadratic.  I think this
>indicates that in my current 2.6.12 FC3 kernel using Ext3 the directory
>data structure is still a list and not a tree.

I fee the need to be clearer here.  Adding an entry to a tree should be
roughly O(log(n)), while adding an entry to a linked list can be O(1),
which is what I see above.  A directory listing might be 0(1) per file with
either type of data structure.  Deleting files should be 0(log(n)) per file
for a tree (about 40% more time per file for doubling), and can easily be
0(n) per file for a list, which is similar to what I see above.  I have
since tried adding entries not in sorted order, with no significant
difference in timing.  Thus, I think that on Ext3, directories are still
kept as unordered linked lists, not as trees.  I could certainly be wrong.
There's always the source.

Still, the timings above support the idea that lots of files make adding
and listing slow, and presumably opening a file would be like listing, so
keeping the number of files per directory modest is probably a good idea.
Maybe I'll try that experiment as well.

Let me know if I'm boring you all, and let's all give a big "Hello!" to
Peter Whalley.
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson@xxxxxxxxxxxxxxxxx>
      '                              <http://www.georgeanelson.com/>


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux