'awk split on first occurrence of character
Trying to use awk
to split each line. If there is more the one p
or q
the second split on the (
does not work correctly (line 2 is an example. I am not able to ignore the second if there is more then one occurrence. I tried ^pq
but that did not produce the desired. Thank you :).
file
1p11.2(120785011_120793480)x3
1q12q21.1(143192432_143450240)x1~2
awk
awk '{split($0,a,"[pq(_]"); print "id"a[1],a[3]}' file
current
id1 120785011
id1 21.1
desired
id1 120785011
id1 143192432
Solution 1:[1]
another awk
$ awk -F'[(_]' '{split($0,a,"[pq]"); print "id"a[1],$2}' file
id1 120785011
id1 143192432
since you don't control the number of pq
s in the line, use two different splits, one for the field delimiter to find the value, the second for the id.
Solution 2:[2]
the split
function returns the number of fields, so we can take advantage of that:
{
n = split($0, a, /[pq(_]/)
printf "id%s %s\n", a[1], a[n-1]
}
outputs
id1 120785011
id1 143192432
Solution 3:[3]
Here is something you can do using FS
regex itself and keeping awk
simple:
awk -F '[(_]|[pq]([^pq]*[pq])*' '{print "id" $1, $3}' file
id1 120785011
id1 143192432
FS
regex details
'[(_]
: Match(
or_
|
: OR[pq]([^pq]*[pq])*
: Matchp
orq
followed by 0 or more non-pq characters followed byp
orq
Solution 4:[4]
I'd use sed for this since it's simple substitutions on a single line which is what sed is best for:
$ sed 's/\([^pq]*\)[^(]*(\([^_]*\).*/id\1 \2/' file
id1 120785011
id1 143192432
Solution 5:[5]
UPDATE 1 : realized I could make it even more succinct :
mawk 'sub("^","id")<--NF' FS='[pq][^(]+[(]|[_].+$'
It works even when there are empty rows embedded in the input because sub()
went first, so NF
won't get decremented into negative zone and triggering an error message.
=============================================================
An awk
-based solution without requiring:
further, and redundant,
array
-splitting, ora back-reference-capable
regex
engine:
input :
1p11.2(120785011_120793480)x3
1q12q21.1(143192432_143450240)x1~2
command ::
mawk 'sub("^","id",$!(NF*=2<NF))' FS='[pq][^(]+[(]|[_].+$'
output :
id1 120785011
id1 143192432
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | karakfa |
Solution 2 | glenn jackman |
Solution 3 | anubhava |
Solution 4 | Ed Morton |
Solution 5 |