Fedora Users — Re: Finding Duplicate Files

On 03/13/2008 02:25:26 PM, Jonathan Roberts wrote:
> Hey all,
> 
> I've got into a bit of a muddle with my backups...more than a little
> in fact!
> 
> I have several folders each approx 10-20 Gb in size. Each has some
> unique material and some duplicate material, and it's even possible
> there's duplicate material in sub-folders too. How can I consolidate
> all of this into a single folder so that I can easily move the backup
> onto different mediums, and get back some disk space!?

Here's a Perl script to compare all files of the same size, and prints 
commands to eliminate all but one of those that compare equal. You'll 
need to modify or consilidate to deal with subdirs. File::Slurp is not 
part of the standard distrubtion; its rpm is perl-File-Slurp.noarch

#!/usr/bin/perl 
use File::Slurp;

my $dir = q{/path-to-your-dir};

my @files = read_dir($dir);

@files = map { [ (stat "$dir/$_" )[7], $_ ] } @files;
@files = sort { $a->[0] <=> $b->[0] } @files;

while ( @files ) {
    my $f = shift @files;
    last unless @files;
    my @dups = ( $f->[1] );
    while ( $f->[0] == $files[0]->[0] ) {
        my $s = shift @files;
        if ( system(qq{ cmp -s $dir/$f->[1] $dir/$s->[1] }) != 0 ) { $f 
= $s; @dups = ( $f->[1] ); }
        else                                                       
{ push @dups, $s->[1]; }
        last unless @files;
    }
    if ( @dups > 1 ) {
        shift @dups;
        map { print "rm $_\n" } @dups;
    }
}