Monday, April 17, 2017

Show duplicate files in a directory

diff between two files is powerful, but sometimes you want to see the which files are the same among many. In principle diff -s would do this, but in reality you do not really need the detailed scanning of diff ; and it takes a lot of time when there are many files.

In such cases you can use cmp which returns a silent 0 when the files are the same. In my case there were some manual file copying and re-naming over the days; it took me a while to see that I have created several duplicates.

Here is the one-liner I used to find the duplicates:

for i in ./*; do for j in ./*; do if [ $i != $j ]; then cmp -s "$i" "$j" && echo " $i and $j are identical!"; fi ; done; done

Simply what we do is
i is the first file, j is the second file, the if-statement is there because a file is identical to itself so we skip the comparison of files with the same name. Then we compare with cmp and if it returns a zero the echo command is executed thanks to && chain. 
One problem with this one-liner is it would return both i=j and j=i cases for each duplicate pair but I did not have time to beautify that part. Let me know if you have a one-liner addition to fix that! Cheers!!

No comments:

Post a Comment