How do I count the number of matches using C++11's std::regex?
std::regex re("[^\\s]+");
std::cout << re.matches("Harry Botter - The robot who lived.").count() << std::endl;
Expected output:
7
You can use regex_iterator to generate all of the matches, then use distance to count them:
std::regex const expression("[^\\s]+");
std::string const text("Harry Botter - The robot who lived.");
std::ptrdiff_t const match_count(std::distance(
std::sregex_iterator(text.begin(), text.end(), expression),
std::sregex_iterator()));
std::cout << match_count << std::endl;
You can use this:
int countMatchInRegex(std::string s, std::string re)
{
std::regex words_regex(re);
auto words_begin = std::sregex_iterator(
s.begin(), s.end(), words_regex);
auto words_end = std::sregex_iterator();
return std::distance(words_begin, words_end);
}
Example usage:
std::cout << countMatchInRegex("Harry Botter - The robot who lived.", "[^\\s]+");
Output:
7
Related
How to extract Test and Again from string s in below code.
Currently I am using regex_iterator and it doesn't seems to be matching groups in regular expression and I am getting {{Test}} and {{Again}} in output.
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
return 0;
}
I also tried using regex_search but it is not working with multiple patterns and only giving Test ouput
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
if (std::regex_search(s, match, rgx,std::regex_constants::match_any))
{
std::cout<<"Match size is "<<match.size()<<std::endl;
for(auto elem:match)
std::cout << "match: " << elem << '\n';
}
}
Also as a side note why two backslashes are needed to escape { or }
To access the contents of the capturing group you need to use .str(1):
std::cout << match.str(1) << std::endl;
See the C++ demo:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
// std::regex rgx("\\{\\{(\\w+)\\}\\}");
// Better, use a raw string literal:
std::regex rgx(R"(\{\{(\w+)\}\})");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << std::endl;
next++;
}
return 0;
}
Output:
Test
Again
Note you do not have to use double backslashes to define a regex escape sequence inside raw string literals (here, R"(pattern_here)").
I would like to split a string like this one
“this1245is#g$0,therhsuidthing345”
using a list of words like the one bellow
{“this”, “is”, “the”, “thing”}
into this list
{“this”, “1245”, “is”, “#g$0,”, “the”, “rhsuid”, “thing”, “345”}
// ^--------------^---------------^------------------^-- these were the delimiters
The delimiters are allowed to appear more than once in the string to split, and it can be done using regular expressions
The precedence is in the order in which the delimiters appear in the array
The platform I'm developing for has no support for the Boost library
Update
This is what I have for the moment
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this1245is#g$0,therhsuidthing345");
std::string delimiters[] = {"this", "is", "the", "thing"};
for (int i=0; i<4; i++) {
std::string delimiter = "(" + delimiters[i] + ")(.*)";
std::regex e (delimiter); // matches words beginning by the i-th delimiter
// default constructor = end-of-sequence:
std::sregex_token_iterator rend;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::sregex_token_iterator c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
}
return 0;
}
output:
1st and 2nd submatches:[this][x1245fisA#g$0,therhsuidthing345]
1st and 2nd submatches:[is][x1245fisA#g$0,therhsuidthing345]
1st and 2nd submatches:[the][rhsuidthing345]
1st and 2nd submatches:[thing][345]
I think I need to make some recursive thing to call on each iteration
Build the expression you want for matches only (re), then pass in {-1, 0} to your std::sregex_token_iterator to return all non-matches (-1) and matches (0).
#include <iostream>
#include <regex>
int main() {
std::string s("this1245is#g$0,therhsuidthing345");
std::regex re("(this|is|the|thing)");
std::sregex_token_iterator iter(s.begin(), s.end(), re, { -1, 0 });
std::sregex_token_iterator end;
while (iter != end) {
//Works in vc13, clang requires you increment separately,
//haven't gone into implementation to see if/how ssub_match is affected.
//Workaround: increment separately.
//std::cout << "[" << *iter++ << "] ";
std::cout << "[" << *iter << "] ";
++iter;
}
}
I don't know how to perform the precedence requirement. This seems to work on the given input:
std::vector<std::string> parse (std::string s)
{
std::vector<std::string> out;
std::regex re("\(this|is|the|thing).*");
std::string word;
auto i = s.begin();
while (i != s.end()) {
std::match_results<std::string::iterator> m;
if (std::regex_match(i, s.end(), m, re)) {
if (!word.empty()) {
out.push_back(word);
word.clear();
}
out.push_back(std::string(m[1].first, m[1].second));
i += out.back().size();
} else {
word += *i++;
}
}
if (!word.empty()) {
out.push_back(word);
}
return out;
}
vector<string> strs;
boost::split(strs,line,boost::is_space());
For example, If I have a string like "first second third forth" and I want to match every single word in one operation to output them one by one.
I just thought that "(\\b\\S*\\b){0,}" would work. But actually it did not.
What should I do?
Here's my code:
#include<iostream>
#include<string>
using namespace std;
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
regex_search(str, res, exp);
cout << res[0] <<" "<<res[1]<<" "<<res[2]<<" "<<res[3]<< endl;
}
Simply iterate over your string while regex_searching, like this:
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
string::const_iterator searchStart( str.cbegin() );
while ( regex_search( searchStart, str.cend(), res, exp ) )
{
cout << ( searchStart == str.cbegin() ? "" : " " ) << res[0];
searchStart = res.suffix().first;
}
cout << endl;
}
This can be done in regex of C++11.
Two methods:
You can use () in regex to define your captures(sub expressions).
Like this:
string var = "first second third forth";
const regex r("(.*) (.*) (.*) (.*)");
smatch sm;
if (regex_search(var, sm, r)) {
for (int i=1; i<sm.size(); i++) {
cout << sm[i] << endl;
}
}
See it live: http://coliru.stacked-crooked.com/a/e1447c4cff9ea3e7
You can use sregex_token_iterator():
string var = "first second third forth";
regex wsaq_re("\\s+");
copy( sregex_token_iterator(var.begin(), var.end(), wsaq_re, -1),
sregex_token_iterator(),
ostream_iterator<string>(cout, "\n"));
See it live: http://coliru.stacked-crooked.com/a/677aa6f0bb0612f0
sregex_token_iterator appears to be the ideal, efficient solution, but the example given in the selected answer leaves much to be desired. Instead, I found some great examples here:
http://www.cplusplus.com/reference/regex/regex_token_iterator/regex_token_iterator/
For your convenience, I've copy-pasted the sample code shown by that page. I claim no credit for the code.
// regex_token_iterator example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
// default constructor = end-of-sequence:
std::regex_token_iterator<std::string::iterator> rend;
std::cout << "entire matches:";
std::regex_token_iterator<std::string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) std::cout << " [" << *a++ << "]";
std::cout << std::endl;
std::cout << "2nd submatches:";
std::regex_token_iterator<std::string::iterator> b ( s.begin(), s.end(), e, 2 );
while (b!=rend) std::cout << " [" << *b++ << "]";
std::cout << std::endl;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::regex_token_iterator<std::string::iterator> c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
std::cout << "matches as splitters:";
std::regex_token_iterator<std::string::iterator> d ( s.begin(), s.end(), e, -1 );
while (d!=rend) std::cout << " [" << *d++ << "]";
std::cout << std::endl;
return 0;
}
Output:
entire matches: [subject] [submarine] [subsequence]
2nd submatches: [ject] [marine] [sequence]
1st and 2nd submatches: [sub] [ject] [sub] [marine] [sub] [sequence]
matches as splitters: [this ] [ has a ] [ as a ]
You could use the suffix() function, and search again until you don't find a match:
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
while (regex_search(str, res, exp)) {
cout << res[0] << endl;
str = res.suffix();
}
}
My code will capture all groups in all matches:
vector<vector<string>> U::String::findEx(const string& s, const string& reg_ex, bool case_sensitive)
{
regex rx(reg_ex, case_sensitive ? regex_constants::icase : 0);
vector<vector<string>> captured_groups;
vector<string> captured_subgroups;
const std::sregex_token_iterator end_i;
for (std::sregex_token_iterator i(s.cbegin(), s.cend(), rx);
i != end_i;
++i)
{
captured_subgroups.clear();
string group = *i;
smatch res;
if(regex_search(group, res, rx))
{
for(unsigned i=0; i<res.size() ; i++)
captured_subgroups.push_back(res[i]);
if(captured_subgroups.size() > 0)
captured_groups.push_back(captured_subgroups);
}
}
captured_groups.push_back(captured_subgroups);
return captured_groups;
}
My reading of the documentation is that regex_search searches for the first match and that none of the functions in std::regex do a "scan" as you are looking for. However, the Boost library seems to be support this, as described in C++ tokenize a string using a regular expression
How do I count the number of matches using C++11's std::regex?
std::regex re("[^\\s]+");
std::cout << re.matches("Harry Botter - The robot who lived.").count() << std::endl;
Expected output:
7
You can use regex_iterator to generate all of the matches, then use distance to count them:
std::regex const expression("[^\\s]+");
std::string const text("Harry Botter - The robot who lived.");
std::ptrdiff_t const match_count(std::distance(
std::sregex_iterator(text.begin(), text.end(), expression),
std::sregex_iterator()));
std::cout << match_count << std::endl;
You can use this:
int countMatchInRegex(std::string s, std::string re)
{
std::regex words_regex(re);
auto words_begin = std::sregex_iterator(
s.begin(), s.end(), words_regex);
auto words_end = std::sregex_iterator();
return std::distance(words_begin, words_end);
}
Example usage:
std::cout << countMatchInRegex("Harry Botter - The robot who lived.", "[^\\s]+");
Output:
7
How can I split a string with Boost with a regex AND have the delimiter included in the result list?
for example, if I have the string "1d2" and my regex is "[a-z]" I want the results in a vector with (1, d, 2)
I have:
std::string expression = "1d2";
boost::regex re("[a-z]");
boost::sregex_token_iterator i (expression.begin (),
expression.end (),
re);
boost::sregex_token_iterator j;
std::vector <std::string> splitResults;
std::copy (i, j, std::back_inserter (splitResults));
Thanks
I think you cannot directly extract the delimiters using boost::regex. You can, however, extract the position where the regex is found in your string:
std::string expression = "1a234bc";
boost::regex re("[a-z]");
boost::sregex_iterator i(
expression.begin (),
expression.end (),
re);
boost::sregex_iterator j;
for(; i!=j; ++i) {
std::cout << (*i).position() << " : " << (*i) << std::endl;
}
This example would show:
1 : a
5 : b
6 : c
Using this information, you can extract the delimitiers from your original string:
std::string expression = "1a234bc43";
boost::regex re("[a-z]");
boost::sregex_iterator i(
expression.begin (),
expression.end (),
re);
boost::sregex_iterator j;
size_t pos=0;
for(; i!=j;++i) {
std::string pre_delimiter = expression.substr(pos, (*i).position()-pos);
std::cout << pre_delimiter << std::endl;
std::cout << (*i) << std::endl;
pos = (*i).position() + (*i).size();
}
std::string last_delimiter = expression.substr(pos);
std::cout << last_delimiter << std::endl;
This example would show:
1
a
234
b
c
43
There is an empty string betwen b and c because there is no delimiter.