On Tue, 2010-05-18 at 16:49 -0400, aragonx@xxxxxxxxxx wrote: > Hello all, > > I need some ideas. > > I have a backup server that contains 10 ext3 file systems each with 12 > million files scattered randomly over 4000 directories. The files average > size is 1MB. So each filesystem is about 12*10^6 * 1MB = 12*10^12 or 12 terabytes? > Every day I expect to get 20 or so requests for files from > this archive. The files were not stored in any logical structure that I > can use to narrow down the search. This will be different moving forward > but it does not help me for the old data. Additionally, every day data is > added and old data is removed to make space. > > So, now that you know a little about the environment, I need ideas on how > to find the file I want to restore fast. > > Using find on the partition is slow. > > I thought about using find and piping the output to a file. I started it > 50 minutes ago and it still isn't done on a single partition. Plus the > file is currently about 1.3GB and how would I maintain such a file? > > Would putting the file names + path in a database be faster? You don't say what the file contents are like, e.g. text, structured data, unstructured binary, etc, nor do you say how you match the file you want (e.g. is it equivalent to a text substring, a regular expression, or what?). Knowing what the contents look like would help to evaluate if it's worth e.g. generating a hash for subsections of the file when it's being stored. Alternatively, it could conceivably make sense to search for strings in the raw disk and work backwards to calculate what files they belong to, who knows? In short, more info is needed to give a sensible answer. poc -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines