'How to match start and end of input with std::regex on Visual Studio
From what I understand, C++ regex symbol ^
should match only the beginning of input and $
should match only the end of input. This can be changed to match begin and end of every line with the std::regex::multiline
flag.
Unfortunately Visual Studio 2017 fails to conform to this behavior:
#include <string>
#include <iostream>
#include <regex>
#include <exception>
int main()
{
std::string test = "\n \n\t \nThe previous three line should be removed.\n \nThe previous line shouldn't be removed, "
"but the next two should be:\n\t\t\t\n ";
std::string out;
try {
std::regex re(R"(^\s*\n|\n\s*$)");
out = std::regex_replace(test, re, "");
}
catch (std::exception& e) {
std::cout << e.what() << std::endl;
}
std::cout << out << std::endl;
}
This will keep the empty line between the two text lines on GCC, but it will be removed under MSVC. Any way to fix this behavior, or even better a portable solution? Is it a bug or intended behavior? Is it compliant to the standard?
Solution 1:[1]
I got hung up on this issue as well. It turns out that this non-standard behavior is expected from Visual Studio 2017 for historical reasons, but they would like to change it in the future.
Here's a link with more info (pasted below for posterity): https://developercommunity.visualstudio.com/t/multiline-c/268592
We marked this as an LWG issue resolution rather than a feature in our C++17 support tables; this is ABI breaking for us to implement so it won't happen until the regex engine is overhauled in an ABI breaking release.
MSVC++'s engine always has "multiline" behavior, following Boost::Regex' design (from which the TR1 regex proposal was derived) which was multiline by default and had a singleline option. For some reason though, the singleline switch from Boost.Regex wasn't standardized.
The other standard libraries assumed ECMAScript/browsers' defaults, and so only had single line mode.
Through discussion in LWG it was decided that because the standard normatively references ECMAScript, and the default there is single line, that std::regex should not be like Boost.Regex here, MSVC++ would need to change its engine to be singleline by default, and all the standard libraries would add a multiline switch.
Unfortunately the representation our regex engine uses doesn't allow easy incorporation of a singleline flag to implement singleline support, and even if it did, changing that default would be likely to introduce subtle changes in behavior to existing programs that just recompile with a compiler update.
As a workaround for now, we continue to recommend Boost.Regex; it not only has more consistent behavior in this area, its performance crushes that of all 3 major standard library implementations at present ( https://www.boost.org/doc/libs/1_67_0/libs/regex/doc/html/boost_regex/background/performance.html )
Can't wait for the ABI break in the sky where we can fix this!
Billy O'Neal
Visual C++ Libraries
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | afk |