'Regex to match strings not enclosed in macro

In a development context, I would like to make sure all strings in source files within certain directories are enclosed in some macro "STR_MACRO". For this I will be using a Python script parsing the source files, and I would like to design a regex for detecting non-commented lines with strings not enclosed in this macro.

For instance, the regex should match the following strings:

std::cout << "Hello World!" << std::endl;
load_file("Hello World!");

But not the following ones:

std::cout << STR_MACRO("Hello World!") << std::endl;
load_file(STR_MACRO("Hello World!"));
// "foo" bar

Excluding commented lines containing strings seems to work well using the regex ^(?!\s*//).*"([^"]+)". However when I try to exclude non-commented strings already enclosed in the macro, using the regex ^(?!\s*//).*(?!STR_MACRO\()"([^"]+)", it does nothing more (seemingly due to with the opening parenthesis after STR_MACRO).

Any hints on how to achieve this?



Solution 1:[1]

With PyPi regex module (that you can install with pip install regex in the terminal) you can use

import regex

pattern = r'''(?:^//.*|STR_MACRO\("[^"\\]*(?:\\.[^"\\]*)*"\))(*SKIP)(*F)|"[^"\\]*(?:\\.[^"\\]*)*"'''
text = r'''For instance, the regex should match the following strings:

std::cout << "Hello World!" << std::endl;
load_file("Hello World!");
But not the following ones:

std::cout << STR_MACRO("Hello World!") << std::endl;
load_file(STR_MACRO("Hello World!"));
// "foo" bar'''
print( regex.sub(pattern, r'STR_MACRO(\g<0>)', text, flags=regex.M) )

Details:

  • (?:^//.*|STR_MACRO\("[^"\\]*(?:\\.[^"\\]*)*"\))(*SKIP)(*F) - // at the line start and the rest of the line, or STR_MACRO( + a double quoted string literal pattern + ), and then the match is skipped, and the next match search starts at the failure location
  • | - or
  • "[^"\\]*(?:\\.[^"\\]*)*" - ", zero or more chars other than " and \, then zero or more reptitions of a \ and then any single char followed with zero or more chars other than a " and \ chars, and then a " char

See the Python demo. Output:

For instance, the regex should match the following strings:

std::cout << STR_MACRO("Hello World!") << std::endl;
load_file(STR_MACRO("Hello World!"));
But not the following ones:

std::cout << STR_MACRO("Hello World!") << std::endl;
load_file(STR_MACRO("Hello World!"));
// "foo" bar

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew