'Awk get 2 columns and count repeated values in new column
The following script is made to create a new csv getting the gender and state columns and counting the repeated values and grouping them by states, but it doesn't seem to be working correctly as the new csv I get is empty. Code
gawk -f scrt.awk ml1.csv > ml2.csv
Script
#!/usr/bin/awk -F
BEGIN { FS=OFS="," }
FNR>1 { counts[$12 OFS $9]++ }
END { for (i in counts) print i,counts[i] }
Input csv
nw,d,nm,year,date,mns,arm,age,gender,rc,city,state,sg
x,x,pac,2015,2015-01-02,sur,les,53,Male,A,Shelton,WA,x
x,x,ces,2015,2015-01-02,sur,les,53,Female,A,Shelton,WA,x
x,x,ret,2015,2015-01-06,sur,ml apon,53,Male,A,Shelton,OR,x
x,x,set,2015,2015-01-02,sur,les,47,Male,W,Aloha,OR,x
x,x,wem,2015,2015-01-04,sur,ml apon,32,Male,W,San Francisco,CA,x
Expected output
state,gender,count
WA,Male,1
WA,Female,1
OR,Male,2
CA,Male,1
Solution 1:[1]
I did run following input
BEGIN { FS=OFS="," }
FNR>1 { counts[$12 OFS $9]++ }
END { for (i in counts) print i,counts[i] }
using gawk 4.2.1 code
nw,d,nm,year,date,mns,arm,age,gender,rc,city,state,sg
x,x,pac,2015,2015-01-02,sur,les,53,Male,A,Shelton,WA,x
x,x,ces,2015,2015-01-02,sur,les,53,Female,A,Shelton,WA,x
x,x,ret,2015,2015-01-06,sur,ml apon,53,Male,A,Shelton,OR,x
x,x,set,2015,2015-01-02,sur,les,47,Male,W,Aloha,OR,x
x,x,wem,2015,2015-01-04,sur,ml apon,32,Male,W,San Francisco,CA,x
and got output
CA,Male,1
WA,Male,1
OR,Male,2
WA,Female,1
which is equivalnet to desired
WA,Male,1
WA,Female,1
OR,Male,2
CA,Male,1
when you ignore order of lines. If you wish to traverse array in certain order in GNU AWK
then set PROCINFO["sorted_in"]
to one of available predefined orders inside BEGIN
or if none suits your needs prepare own comparison function
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Daweo |