'How to replace "00" in data with "N/A" skipping first row in sed?

I'm working with GWAS data, My data looks like this:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
300,TT,AA,00,AG,AA       
400,GG,AG,00,GT,GG

Desired Output:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
98,TT,AA,N/A,AG,AA       
3,GG,AG,N/A,GT,GG

Here I'm trying to replace "00" with "N/A", but since I have 00 in the first_row/header_row and First column i.e IId, it's replacing here with N/A like kgp11N/A4425, rs11274N/A5,kgp183N/A5.... and Id column values with 300, 400, 500 as 3N/A, 4N/A, 5N/A. The bash command I used:

sed 's~00~N/A~g' allSNIPsFinaldata.csv 

Can anyone please help "how not to include/Skip the first row or header row and first column and apply this effect. please help



Solution 1:[1]

You may specify an address to select the line(s) to apply the command to. Thus you might choose to exclude the first line like this:

sed '1!s~00~N/A~g' allSNIPsFinaldata.csv

As a sidenote I'd like to note that your example isn't actually CSV despite the file name; your header is comma-delimited but the rest of the file is using spaces.

Solution 2:[2]

With 2 capture groups you can use this sed:

sed -E 's~(^|[[:blank:]])00([[:blank:]]|$)~\1N/A\2~g' file

IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1       N/A           AG        GT            AK          N/A
32      AG           GG        AA            N/A          AT
98      TT           AA        N/A            AG          AA
3       GG           AG        N/A            GT          GG

Details:

  • (^|[[:blank:]]): Match start or a whitespace in capture group #1
  • 00: Match 00
  • ([[:blank:]]|$): Match end or a whitespace in capture group #2
  • \1N/A\2: Replacement to put back value of capture group #1 followed by N/A followed by value of capture group #2

Solution 3:[3]

Using sed

$ sed 's|\<00\>|N/A|g' input_file
IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1       N/A           AG        GT            AK          N/A
32      AG           GG        AA            N/A          AT
98      TT           AA        N/A            AG          AA
3       GG           AG        N/A            GT          GG

Solution 4:[4]

You might also skip the first row starting from the second one:

sed '2,$s~00~N/A~g' allSNIPsFinaldata.csv

If you don't want partial word matches, you can implement word boundaries around the 00 in different ways.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Klaus Klein
Solution 2 anubhava
Solution 3 HatLess
Solution 4