At 12:30 PM -0700 7/16/05, Mike McMullen wrote: >----- Original Message ----- >From: "Tony Nelson" <tonynelson@xxxxxxxxxxxxxxxxx> > > >At 8:53 AM -0400 7/16/05, Matthew Miller wrote: >>On Sat, Jul 16, 2005 at 08:04:22AM -0400, fredex wrote: >>> Now, for performance reasons, it is often not a good idea ot have >>> many thousands of files in a single directory. As the number of files >>> grows large the time it takes to access a file grows larger. I haven't >>> looked into this on any linux file system, but on other unixes I've >>> observed delays reaching up into the whole-second region when many >>>thousands >>> of files are in a single directory. >> >>Shouldn't be a problem on a new install of ext3 on modern Linux. See >><http://lwn.net/Articles/11481/>. > >I suppose the simple test is to try it. Time (microseconds) per file to run >a script that creates files by touching them, to ls a few near the end, and >to remove all the files; see script at end of transcript. > >#files touch ls rm >------ ----- ----- ----- > 1000 309 016 054 > 10000 331 008 051 >100000 326 007 357 >200000 330 007 1065 > >I didn't want to try a million files. Creation and listing seem to be >linear with the number of files, but rm seems quadratic. I think this >indicates that in my current 2.6.12 FC3 kernel using Ext3 the directory >data structure is still a list and not a tree. > >I expect that some sort of directory hierarchy to limit the number of files >per directory would still be a win. > >Transcript follows. Max line width 114 chars. > >------ End of Original Message ------- > >Tony and Mathew thanks very much for your help!!! > >This confirms roughly what I thought would be the case. Tony, thanks >especially for going to the trouble of testing it! You're welcome. I've had a few more thoughts on the matter; they don't change the bottom line. >#files touch ls rm >------ ----- ----- ----- > 1000 309 016 054 > 10000 331 008 051 >100000 326 007 357 >200000 330 007 1065 > > ... Creation and listing seem to be >linear with the number of files, but rm seems quadratic. I think this >indicates that in my current 2.6.12 FC3 kernel using Ext3 the directory >data structure is still a list and not a tree. I fee the need to be clearer here. Adding an entry to a tree should be roughly O(log(n)), while adding an entry to a linked list can be O(1), which is what I see above. A directory listing might be 0(1) per file with either type of data structure. Deleting files should be 0(log(n)) per file for a tree (about 40% more time per file for doubling), and can easily be 0(n) per file for a list, which is similar to what I see above. I have since tried adding entries not in sorted order, with no significant difference in timing. Thus, I think that on Ext3, directories are still kept as unordered linked lists, not as trees. I could certainly be wrong. There's always the source. Still, the timings above support the idea that lots of files make adding and listing slow, and presumably opening a file would be like listing, so keeping the number of files per directory modest is probably a good idea. Maybe I'll try that experiment as well. Let me know if I'm boring you all, and let's all give a big "Hello!" to Peter Whalley. ____________________________________________________________________ TonyN.:' <mailto:tonynelson@xxxxxxxxxxxxxxxxx> ' <http://www.georgeanelson.com/>