'How to make regex works with perl command and extract numbers from a file?

I'm trying to extract from a tab delimited file a number that i need to store in a variable. I'm approaching the problem with a regex that thanks to some research online I have been able to built.

The file is composed as follow:

0   0   2500    5000
1   5000    7500    10000
2   10000   12500   15000
3   15000   17500   20000
4   20000   22500   25000
5   25000   27500   30000

I need to extract the number in the second column given a number of the first one. I wrote and tested online the regex:

(?<=5\t).*?(?=\t)

I need the 25000 from the sixth line.

I started working with sed but as you already know, it doesn't like lookbehind and lookahead pattern even with the -E option to enable extended version of regular expressions. I tried also with awk and grep and failed for similar reasons.

Going further I found that perl could be the right command but I'm not able to make it work properly. I'm trying with the command

perl -pe '/(?<=5\t).*?(?=\t)/' | INFO.out

but I admit my poor knowledge and I'm a bit lost.

The next step would be to read the "5" in the regex from a variable so if you already know problems that could rise, please let me know.



Solution 1:[1]

No need for lookbehinds -- split each line on space and check whether the first field is 5.

In Perl there is a command-line option convenient for this, -a, with which each line gets split for us and we get @F array with fields

perl -lanE'say $F[1] if $F[0] == 5' data.txt

Note that this tests for 5 numerically (==)

Solution 2:[2]

grep supports -P for perl regex, and -o for only-matching, so this works with a lookbehind:

grep -Po '(?<=5\t)\d+' file

That can use a shell variable pretty easily:

VAR=5 && grep -Po "(?<=$VAR\t)\d+"

Or perl -n, to show using s///e to match and print capture group:

perl -lne 's/^5\t(\d+)/print $1/e' file

Solution 3:[3]

Why do you need to use a regex? If all you are doing is finding lines starting with a 5 and getting the second column you could use sed and cut, e.g.:

<infile sed -n '/^5\t/p' | cut -f2

Output:

25000

Solution 4:[4]

One option is to use sed, match 5 at the start of the string and after the tab capture the digits in a group

sed -En 's/^5\t([[:digit:]]+)\t.*/\1/p' file > INFO.out

The file INFO.out contains:

25000

Solution 5:[5]

Using sed

$ var1=$(sed -n 's/^5[^0-9]*\([^ ]*\).*/\1/p' input_file)
$ echo "$var1"
25000

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 zdim
Solution 2
Solution 3
Solution 4 The fourth bird
Solution 5 HatLess