'How to make regex works with perl command and extract numbers from a file?
I'm trying to extract from a tab delimited file a number that i need to store in a variable. I'm approaching the problem with a regex that thanks to some research online I have been able to built.
The file is composed as follow:
0 0 2500 5000
1 5000 7500 10000
2 10000 12500 15000
3 15000 17500 20000
4 20000 22500 25000
5 25000 27500 30000
I need to extract the number in the second column given a number of the first one. I wrote and tested online the regex:
(?<=5\t).*?(?=\t)
I need the 25000 from the sixth line.
I started working with sed but as you already know, it doesn't like lookbehind and lookahead pattern even with the -E
option to enable extended version of regular expressions. I tried also with awk and grep and failed for similar reasons.
Going further I found that perl could be the right command but I'm not able to make it work properly. I'm trying with the command
perl -pe '/(?<=5\t).*?(?=\t)/' | INFO.out
but I admit my poor knowledge and I'm a bit lost.
The next step would be to read the "5" in the regex from a variable so if you already know problems that could rise, please let me know.
Solution 1:[1]
No need for lookbehinds -- split each line on space and check whether the first field is 5
.
In Perl there is a command-line option convenient for this, -a
, with which each line gets split for us and we get @F
array with fields
perl -lanE'say $F[1] if $F[0] == 5' data.txt
Note that this tests for 5
numerically (==
)
Solution 2:[2]
grep
supports -P
for perl regex, and -o
for only-matching, so this works with a lookbehind:
grep -Po '(?<=5\t)\d+' file
That can use a shell variable pretty easily:
VAR=5 && grep -Po "(?<=$VAR\t)\d+"
Or perl -n
, to show using s///e
to match and print capture group:
perl -lne 's/^5\t(\d+)/print $1/e' file
Solution 3:[3]
Why do you need to use a regex? If all you are doing is finding lines starting with a 5 and getting the second column you could use sed
and cut
, e.g.:
<infile sed -n '/^5\t/p' | cut -f2
Output:
25000
Solution 4:[4]
One option is to use sed, match 5 at the start of the string and after the tab capture the digits in a group
sed -En 's/^5\t([[:digit:]]+)\t.*/\1/p' file > INFO.out
The file INFO.out contains:
25000
Solution 5:[5]
Using sed
$ var1=$(sed -n 's/^5[^0-9]*\([^ ]*\).*/\1/p' input_file)
$ echo "$var1"
25000
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | zdim |
Solution 2 | |
Solution 3 | |
Solution 4 | The fourth bird |
Solution 5 | HatLess |