'C++ - Split string by regex

I want to split std::string by regex.

I have found some solutions on Stackoverflow, but most of them are splitting string by single space or using external libraries like boost.

I can't use boost.

I want to split string by regex - "\\s+".

I am using this g++ version g++ (Debian 4.4.5-8) 4.4.5 and i can't upgrade.



Solution 1:[1]

You don't need to use regular expressions if you just want to split a string by multiple spaces. Writing your own regex library is overkill for something that simple.

The answer you linked to in your comments, Split a string in C++?, can easily be changed so that it doesn't include any empty elements if there are multiple spaces.

std::vector<std::string> &split(const std::string &s, char delim,std::vector<std::string> &elems) {
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        if (item.length() > 0) {
            elems.push_back(item);  
        }
    }
    return elems;
}


std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, elems);
    return elems;
}

By checking that item.length() > 0 before pushing item on to the elems vector you will no longer get extra elements if your input contains multiple delimiters (spaces in your case)

Solution 2:[2]

#include <regex>

std::regex rgx("\\s+");
std::sregex_token_iterator iter(string_to_split.begin(),
    string_to_split.end(),
    rgx,
    -1);
std::sregex_token_iterator end;
for ( ; iter != end; ++iter)
    std::cout << *iter << '\n';

The -1 is the key here: when the iterator is constructed the iterator points at the text that precedes the match and after each increment the iterator points at the text that followed the previous match.

If you don't have C++11, the same thing should work with TR1 or (possibly with slight modification) with Boost.

Solution 3:[3]

To expand on the answer by @Pete Becker I provide an example of resplit function that can be used to split text using regexp:

#include <regex>

std::vector<std::string> resplit(const std::string &s, const std::regex &sep_regex = std::regex{"\\s+"}) {
  std::sregex_token_iterator iter(s.begin(), s.end(), sep_regex, -1);
  std::sregex_token_iterator end;
  return {iter, end};
}

This works as follows:

   string s1 = "first   second third    ";
   vector<string> v22 = resplit(s1);

   for (const auto & e: v22) {
       cout <<"Token:" << e << endl;
   }

   //Token:first
   //Token:second
   //Token:third


   string s222 = "first|second:third,forth";
   vector<string> v222 = resplit(s222, "[|:,]");

   for (const auto & e: v222) {
       cout <<"Token:" << e << endl;
   }

   //Token:first
   //Token:second
   //Token:third
   //Token:forth

Solution 4:[4]

string s = "foo bar  baz";
regex e("\\s+");
regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
regex_token_iterator<string::iterator> end;
while (i != end)
   cout << " [" << *i++ << "]";

prints [foo] [bar] [baz]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 Duloren
Solution 3 Martin Valgur
Solution 4 solstice333