There are times when we want uniq rows in different files. One good example is when we have emails that spans across multiple csv and we dont want them to be replicated. I wrote this script and it seems to do the job well…. Depending on the size of the files, it might take a long time to process – you have been warned!
I called this program “uniqrow”
#!/bin/sh # make the first file uniq first awk '!x[$0]++' $1 > $$.tmp rm $1 mv $$.tmp $1 # now loop compare each file for (( i=1; i<$#; i++ )) do # get curr pointer CURR=`eval echo \\$$i`; echo "processing $CURR ..." for (( j=(($i+1)); j<=$#; j++ )) do # get next pointer NEXT=`eval echo \\$$j`; echo "removing duplicates in $NEXT from $CURR ..." cat $CURR | while read a; do cat $NEXT | while read b; do if [ "$a" = "$b" ]; then sed -i "/$b/d" $CURR; break;fi; done; done; done done echo "All your files now have unique rows!!"; exit 0;