--- On Tue, 2/23/10, Marko Vojinovic <vvmarko@xxxxxxxxx> wrote: > From: Marko Vojinovic <vvmarko@xxxxxxxxx> > Subject: Recursive comparing of files > To: users@xxxxxxxxxxxxxxxxxxxxxxx > Date: Tuesday, February 23, 2010, 5:31 PM > > Hi folks! :-) > > I have the following task: there are two directories on the > disk, say a/ and > b/, with various subdirectories and files inside. I need to > find and erase all > *duplicate* files, and after that all empty directories. > The files may reside in > different directories, may have different names, but if > they have identical > *contents*, file from b/ branch should be deleted. > > Now, the directories that I have are rather large and I > wouldn't want to go > hunt for duplicates manually. Is there some tool that can > at least identify > and list duplicate files in some directory structure? > > I could think of an algorithm like: > > 1) list all files in all subdirectories of a/ along with > their file size > 2) do the same thing for files in b/ > 3) sort and compare lists, look for pairs of files with > identical size > 4) test each pair to see if the file content is the same, > and if yes, list them > in the output > > I could probably be able to write a bash script which would > do this, but I > guess this problem is common and there are already some > available tools which > would do this for me. Any suggestions? > > Thanks, :-) > Marko > > -- There is a tool called fdupes. Read more about it here: http://www.cyberciti.biz/faq/linux-unix-finds-duplicate-files-in-given-directories/ <quote> You need to use a tool called fdupes. It will searche the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison. fdupes is a nice tool to get rid of duplicate files. </quote> > Regards, Antonio -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines