Making every row uniq in multiple files


There are times when we want uniq rows in different files. One good example is when we have emails that spans across multiple csv and we dont want them to be replicated. I wrote this script and it seems to do the job well…. Depending on the size of the files, it might take a long time to process – you have been warned!

I called this program “uniqrow”

#!/bin/sh

# make the first file uniq first
awk '!x[$0]++' $1 > $$.tmp
rm $1
mv $$.tmp $1

# now loop compare each file
for (( i=1; i<$#; i++ ))
  do
  # get curr pointer
  CURR=`eval echo \\$$i`;
  echo "processing $CURR ..."

  for (( j=(($i+1)); j<=$#; j++ ))
    do 
    # get next pointer
    NEXT=`eval echo \\$$j`;
    echo "removing duplicates in $NEXT from $CURR ..."

    cat $CURR | while read a; do cat $NEXT | while read b; do if [ "$a" = "$b" ]; then sed -i "/$b/d" $CURR; break;fi; done; done;


  done
done

echo "All your files now have unique rows!!";
exit 0;

Like it.? Share it:

Comments are closed.