'perl match nested regex and write back to file

I have a txt file and I would like to match a certain pattern and convert a numeric field from decimal to hex and write back existing file using perl.

Before

Wins {
  XYZ: 8,  SCHOOL1, SCHOOL2;
  ABC: 1,  SCHOOL4, SCHOOL4;
}

Losses {
  XYZABC: 100, SCHOOL1, 4 {
  IPQSAP, 2;
}

  EFGHIJ: 200, SCHOOL2, 9 {
  IBQCAP, 2;
}
}

After

TEST_version = "1_1.0";

Wins {
  XYZ: 8, SCHOOL1, SCHOOL2;
  ABC: 1, SCHOOL4, SCHOOL4;
}

Losses {
  XYZABC: 0x64, SCHOOL1, 4 {
  IPQSAP, 2;
}

  EFGHIJ: 0xC8, SCHOOL2, 9 {
  IBQCAP, 2;

}
}

In the above example under pattern Losses { and after pattern XYZABC: , EFGHIJ: or any other pattern containing alphanumeric value: match check if next value is numeric and is decimal if so convert to hexadecimal

I am trying for something like below pattern but not able to convert and write to new file.

if (/Losses {/)
   if ( /(\w+):\s*(\w+),\s*(\w+),\s*(\w+)/ ) {
    
        if ($2 =~ /0[xX][0-9a-fA-F]+/)
        {
         //Value is already in Hexadecimal
        } elsif($2 =~ /[0-9]+/) {
            print("\n Decimal");
            my $hex = sprintf("0x%02X", $2);
            print(" Hex equivalent = $hex");  
         }
   }
}


Solution 1:[1]

The main challenge with a regex approach is the nested braces. There is a module available to handle that called Text::Balanced. The extract_bracketed function in Text::Balanced lets you specify the type of brackets you need to parse. The tricky part is getting the regex expression for the prefix correct. My example demonstrates this with two expressions: the first one fails.

If you don't have the regex correct or there is another issue, 'extract_bracketed' writes errors to $@. It is set to undef if there are no errors.

Once you have extracted the losses section you can use a simple regex to find and convert the decimal number to hex. I used the 'e' modifier to evaluate an sprintf expression to handle the conversion to hex.

use warnings;
use strict;
use Text::Balanced 'extract_bracketed';

# slurp the file contents.
# local $/ sets the record seperator to undef within the block.
# <DATA> will then read the whole file. 

my $file_contents = do { local $/; <DATA> };

#               These prefixes are regex patterns for 'extract_bracketed'.
my @prefix_paterns = ( 
                        '.*Losses ',      # The '.' doesn't match newlines so
                                          # this doesn't match anything.
                        '(?s).*Losses ',  # (?s) lets '.' match newlines.
                     );

foreach my $prefix (@prefix_paterns){
    my ($extracted, $remainder, $prefix_match) = extract_bracketed(
                                                           $file_contents, 
                                                           '{}', 
                                                           $prefix );

    # 'extract_bracketed' writes errors to $@. It is undef if no error.
    if (length($@)){
        print "Error: $@\n\n";
    }
    else {
        print "Extracted:\n$extracted\n";
        print '-'x75, "\n";
        #                     
        # The 'e' modifier causes the substitution operator to evaluate the 
        #    expression in the second position.
        #
        # If the number is already in hex format it won't match the regex (\d+).
        #
        $extracted =~ s/(\w+:\s*)(\d+)/sprintf "%s%#x", $1, $2/eg;
        print "Modified:\n$prefix_match$extracted$remainder\n";
    }
}
__DATA__
Wins {
  XYZ: 8,  SCHOOL1, SCHOOL2;
  ABC: 1,  SCHOOL4, SCHOOL4;
}

Losses {
  XYZABC: 100, SCHOOL1, 4 {
  IPQSAP, 2;
}

  EFGHIJ: 200, SCHOOL2, 9 {
  IBQCAP, 2;
}
}

Draws {}

I added the 'Draws' section to your sample data to demonstrate that extract_bracketed will extract the part we want in the middle of the file.

The output from this script looks like this:

Error: Did not find prefix: /.*Losses /, detected at offset 0

Extracted:
{
  XYZABC: 100, SCHOOL1, 4 {
  IPQSAP, 2;
}

  EFGHIJ: 200, SCHOOL2, 9 {
  IBQCAP, 2;
}
}
---------------------------------------------------------------------------
Modified:
Wins {
  XYZ: 8,  SCHOOL1, SCHOOL2;
  ABC: 1,  SCHOOL4, SCHOOL4;
}

Losses {
  XYZABC: 0x64, SCHOOL1, 4 {
  IPQSAP, 2;
}

  EFGHIJ: 0xc8, SCHOOL2, 9 {
  IBQCAP, 2;
}
}

Draws {}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1