Re: Finding Duplicate Files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2008-03-13 at 21:25 +0000, Jonathan Roberts wrote:
> Hey all,
> 
> I've got into a bit of a muddle with my backups...more than a little in fact!
> 
> I have several folders each approx 10-20 Gb in size. Each has some
> unique material and some duplicate material, and it's even possible
> there's duplicate material in sub-folders too. How can I consolidate
> all of this into a single folder so that I can easily move the backup
> onto different mediums, and get back some disk space!?

find top-directory -type f -print|xargs sha1sum|sort

This will take a while (!) but you'll get a list of all the files
grouped according to their SHA1 checksums. You can assume that two files
with the same checksum have the same content (or use cmp if you're
paranoid). If you want to get fancy, use the checksum as an associative
array index to group the filenames in buckets (left as an exercise for
the reader :-)

poc


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux