'How to match start and end of input with std::regex on Visual Studio

From what I understand, C++ regex symbol ^ should match only the beginning of input and $ should match only the end of input. This can be changed to match begin and end of every line with the std::regex::multiline flag. Unfortunately Visual Studio 2017 fails to conform to this behavior:

#include <string>
#include <iostream>
#include <regex>
#include <exception>

int main()
{
    std::string test = "\n \n\t \nThe previous three line should be removed.\n    \nThe previous line shouldn't be removed, "
        "but the next two should be:\n\t\t\t\n  ";

    std::string out;
    try {
        std::regex re(R"(^\s*\n|\n\s*$)");
        out = std::regex_replace(test, re, "");
    }
    catch (std::exception& e) {
        std::cout << e.what() << std::endl;
    }
    std::cout << out << std::endl;
}

This will keep the empty line between the two text lines on GCC, but it will be removed under MSVC. Any way to fix this behavior, or even better a portable solution? Is it a bug or intended behavior? Is it compliant to the standard?



Solution 1:[1]

I got hung up on this issue as well. It turns out that this non-standard behavior is expected from Visual Studio 2017 for historical reasons, but they would like to change it in the future.

Here's a link with more info (pasted below for posterity): https://developercommunity.visualstudio.com/t/multiline-c/268592

We marked this as an LWG issue resolution rather than a feature in our C++17 support tables; this is ABI breaking for us to implement so it won't happen until the regex engine is overhauled in an ABI breaking release.

MSVC++'s engine always has "multiline" behavior, following Boost::Regex' design (from which the TR1 regex proposal was derived) which was multiline by default and had a singleline option. For some reason though, the singleline switch from Boost.Regex wasn't standardized.

The other standard libraries assumed ECMAScript/browsers' defaults, and so only had single line mode.

Through discussion in LWG it was decided that because the standard normatively references ECMAScript, and the default there is single line, that std::regex should not be like Boost.Regex here, MSVC++ would need to change its engine to be singleline by default, and all the standard libraries would add a multiline switch.

Unfortunately the representation our regex engine uses doesn't allow easy incorporation of a singleline flag to implement singleline support, and even if it did, changing that default would be likely to introduce subtle changes in behavior to existing programs that just recompile with a compiler update.

As a workaround for now, we continue to recommend Boost.Regex; it not only has more consistent behavior in this area, its performance crushes that of all 3 major standard library implementations at present ( https://www.boost.org/doc/libs/1_67_0/libs/regex/doc/html/boost_regex/background/performance.html )

Can't wait for the ABI break in the sky where we can fix this!

Billy O'Neal

Visual C++ Libraries

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 afk