bash - run uniq on csv file ignoring column preserving highest in file -


a data vendor use has bug , taking long time fix it.

here's simplified version of csv files receive them:

# cat new_data20130904.csv a,001,b,c,d e,002,f,g,h e,003,f,g,h i,004,j,k,l 

column 2 of rows 2 , 3 unique, data same.

row 3 should never have been created vendor, bug has been acknowledged vendor , fix promised, don't expect soon.

i need parse , modify csv file becomes:

a,001,b,c,d e,002,f,g,h i,004,j,k,l 

i want code defensive remove these falsely duplicate rows.

ideally i'd use ubuntu/debian builtins.

initially, thought removing second field , running through uniq start:

# cut -d, -f1,3- new_data20130904.csv | uniq a,b,c,d e,f,g,h i,j,k,l 

but can't think of way of adding column 2 in, don't think help.

what this?

$ awk -f, '{if (a[$1]) next}a[$1]=$0' file a,001,b,c,d e,002,f,g,h i,004,j,k,l 

explanation

we store first column in array. in case in array, skip record.

  • -f, sets field delimiter comma ,.
  • {if (a[$1]) next} in case first field in array, skip.
  • a[$1]=$0 saves first field key of array a , prints line (print $0 default behaviour of awk, not need written).

and how tweak if nth column needed ignored?

you can replace a[$1] a[$n], n column.


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

javascript - storing input from prompt in array and displaying the array -