Making every row uniq in multiple files

There are times when we want uniq rows in different files. One good example is when we have emails that spans across multiple csv and we dont want them to be replicated. I wrote this script and it seems to do the job well…. Depending on the size of the files, it might take a long time to process – you have been warned!

I called this program “uniqrow”

#!/bin/sh

# make the first file uniq first
awk '!x[$0]++' $1 > $$.tmp
rm $1
mv $$.tmp $1

# now loop compare each file
for (( i=1; i<$#; i++ ))
  do
  # get curr pointer
  CURR=`eval echo \\$$i`;
  echo "processing $CURR ..."

  for (( j=(($i+1)); j<=$#; j++ ))
    do 
    # get next pointer
    NEXT=`eval echo \\$$j`;
    echo "removing duplicates in $NEXT from $CURR ..."

    cat $CURR | while read a; do cat $NEXT | while read b; do if [ "$a" = "$b" ]; then sed -i "/$b/d" $CURR; break;fi; done; done;


  done
done

echo "All your files now have unique rows!!";
exit 0;

Author: bpeh

Bernard Peh is a great passioner of web technologies and one of the co-founder of Sitecritic.net Website Design and Reviews. He works with experienced web designers and developers everyday, developing and designing commercial websites. He specialises mainly in SEO and PHP programming.