> On Tue, 2010-05-18 at 16:49 -0400, aragonx@xxxxxxxxxx wrote: >> Hello all, >> >> I need some ideas. >> >> I have a backup server that contains 10 ext3 file systems each with 12 >> million files scattered randomly over 4000 directories. The files >> average >> size is 1MB. > > So each filesystem is about 12*10^6 * 1MB = 12*10^12 or 12 terabytes? Each filesystem is 2.5TB so the average file size must be much smaller. At last count, one of the filesystems contained 20 million files. > You don't say what the file contents are like, e.g. text, structured > data, unstructured binary, etc, nor do you say how you match the file > you want (e.g. is it equivalent to a text substring, a regular > expression, or what?). Knowing what the contents look like would help to > evaluate if it's worth e.g. generating a hash for subsections of the > file when it's being stored. Alternatively, it could conceivably make > sense to search for strings in the raw disk and work backwards to > calculate what files they belong to, who knows? The data in the files is of the unstructured binary type. When I do a search, I have _most_ of the file name. Enough to uniquely identify it. I hope that helps. --- Will Y. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines