bash - run uniq on csv file ignoring column preserving highest in file -
a data vendor use has bug , taking long time fix it.
here's simplified version of csv files receive them:
# cat new_data20130904.csv a,001,b,c,d e,002,f,g,h e,003,f,g,h i,004,j,k,l
column 2 of rows 2 , 3 unique, data same.
row 3 should never have been created vendor, bug has been acknowledged vendor , fix promised, don't expect soon.
i need parse , modify csv file becomes:
a,001,b,c,d e,002,f,g,h i,004,j,k,l
i want code defensive remove these falsely duplicate rows.
ideally i'd use ubuntu/debian builtins.
initially, thought removing second field , running through uniq start:
# cut -d, -f1,3- new_data20130904.csv | uniq a,b,c,d e,f,g,h i,j,k,l
but can't think of way of adding column 2 in, don't think help.
what this?
$ awk -f, '{if (a[$1]) next}a[$1]=$0' file a,001,b,c,d e,002,f,g,h i,004,j,k,l
explanation
we store first column in array. in case in array, skip record.
-f,
sets field delimiter comma,
.{if (a[$1]) next}
in case first field in array, skip.a[$1]=$0
saves first field key of arraya
, prints line (print $0
default behaviour of awk, not need written).
and how tweak if nth column needed ignored?
you can replace a[$1]
a[$n]
, n
column.
Comments
Post a Comment