Re: Find duplicated files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > > Is there anyway I can now which files are
> > > duplicated in some directories?
> >
> > One approach is to use find, then use md5sum, then use
> > sort on the output of md5sum, then look for duplicate
> > md5's.
> >
> >   $ find /home -type f -print0 \
> >     | xargs -0 md5sum \
> >     | sort ... \
> >     | less
>
> Not a bad idea.
>
> Only thing with this good idea is it needs more of a script to actually
> look for the duplicate md5sum. A huge directory will most definitely
> have an issue.

In a two-step process, you could do:

  find . -type f -print0 | \
  xargs -0 md5sum | \
  sort | \
  cut -c1-32 | \
  uniq -d

(cut will get rid of the path-file-names, which most probably will be
different, at least at the path level, and uniq will only list md5sums that
appeared more than once).  Now, for each line returned, run the find again,
this time with a grep; using egrep you could even do this:

  find . -type f -print0 | \
  xargs -0 md5sum | \
  sort | \
  egrep '(<val1>|<val2>|...)'

--Marcin


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux