I'm trying to use regular expression to parse SQL statement while confused by the behavior of "sregex_token_iterator".
My function f() and g() looks similar while the former prints two sentences and the latter, g() prints one only:
Here is f():
void f()
{
cout << "in f()" << endl;
string str = " where a <= 2 and b = 2";
smatch result;
regex pattern("(\\w+\\s*(<|=|>|<>|<=|>=)\\s*\\w+)");
const sregex_token_iterator end;
for (sregex_token_iterator it(str.begin(), str.end(), pattern); it != end; it ++)
{
cout << *it << endl;
}
}
Here is g():
void g()
{
cout << "in g()" << endl;
string str = " where a <= 2 and b = 2";
smatch result;
regex pattern("(\\w+\\s*(<|=|>|<>|<=|>=)\\s*\\w+)");
const sregex_token_iterator end;
for (sregex_token_iterator it(str.begin(), str.end(), pattern); it != end; it ++)
{
cout << *it << endl;
string cur = *it;
pattern = "(\\w+)\\s*<>\\s*(\\w+)";
if ( regex_match(cur, result, pattern) )
{
// cout <<"<>" << endl;
}
pattern = "(\\w+)\\s*=\\s*(\\w+)";
if ( regex_match(cur, result, pattern) ){}
pattern = "(\\w+)\\s*<\\s*(\\w+)";
if ( regex_match(cur, result, pattern) ){}
pattern = "(\\w+)\\s*>\\s*(\\w+)";
if ( regex_match(cur, result, pattern) ){}
pattern = "(\\w+)\\s*<=\\s*(\\w+)";
if ( regex_match(cur, result, pattern) ){}
pattern = "(\\w+)\\s*>=\\s*(\\w+)";
if ( regex_match(cur, result, pattern) ){}
}
}
I'm guessing the variable 'end'("const sregex_token_iterator end;") changed in g() or the judge condition in "for" clause failed after it ++.
If it did, how did that happen.
And what should I do to fix that?
sregex_token_iterator stores a pointer to pattern, not a copy. You are changing the regular expression right from under the iterator.
Related
I have no idea about boost, could anybody please tell me what exactly this function is doing?
int
Function(const string& tempStr)
{
boost::regex expression ("result = ");
std::string::const_iterator start, end;
start = tempStr.begin();
end = tempStr.end();
boost::match_results<std::string::const_iterator> what;
boost::regex_constants::_match_flags flags = boost::match_default;
int count = 0;
while(regex_search(start, end, what, expression, flags)){
start = what[0].second;
count++;
}
cout << "Count :"<< count << endl;
return count;
}
match_results is a collection of sub_match objects. The first sub_match object (index 0) represents the full match in the target sequence (subsequent matches would correspond to the subexpressions matches). Your code is searching for result = matches and restarting the search each time from the end of the previous match (what[0].second)
int
Function(const string& tempStr)
{
boost::regex expression ("result = ");
std::string::const_iterator start, end;
start = tempStr.begin();
end = tempStr.end();
boost::match_results<std::string::const_iterator> what;
boost::regex_constants::_match_flags flags = boost::match_default;
int count = 0;
while(regex_search(start, end, what, expression, flags)){
start = what[0].second;
count++;
}
cout << "Count :"<< count << endl;
return count;
}
int main()
{
Function("result = 22, result = 33"); // Outputs 'Count: 2'
}
Live Example
The base of the functionality is searching for a regular expression match on tempStr.
Look at the regex_search documentation and notice what the match_result contains after it finishes (that's the 3rd parameter, or what in your code sample). From there understanding the while loop should be straightforward.
This function is a complicated way to count the number of occurrences of "result = " string. A simpler way would be:
boost::regex search_string("result = ");
auto begin = boost::make_regex_iterator(tempStr, search_string);
int count = std::distance(begin, {});
Which can be collapsed to a one-liner, with possible loss of readability.
This is a match counter function:
The author uses useless code: here is the equivalent code in std ( also boost )
unsigned int count_match( std::string user_string, const std::string& user_pattern ){
const std::regex rx( user_pattern );
std::regex_token_iterator< std::string::const_iterator > first( user_string. begin(), user_string.end(), rx ), last;
return std::distance( first, last );
}
and with std::regex_search it can be (also boost ):
unsigned int match_count( std::string user_string, const std::string& user_pattern ){
unsigned int counter = 0;
std::match_results< std::string::const_iterator > match_result;
std::regex regex( user_pattern );
while( std::regex_search( user_string, match_result, regex ) ){
user_string = match_result.suffix().str();
++counter;
}
return counter;
}
NOTE:
no need to use this part:
std::string::const_iterator start, end;
start = tempStr.begin();
end = tempStr.end();
Also
boost::match_results<std::string::const_iterator> what;
can be
boost::smatch what // a typedef of match_results<std::string::const_iterator>
no need:
boost::regex_constants::_match_flags flags = boost::match_default;
because by default regex_search has this flag
this:
start = what[0].second;
is for updating the iteration that can be:
match_result.suffix().str();
if you want to see what happen in the while loop use this code:
std::cout << "prefix: '" << what.prefix().str() << '\n';
std::cout << "match : '" << what.str() << '\n';
std::cout << "suffix: '" << what.suffix().str() << '\n';
std::cout << "------------------------------\n";
I'd like to do the c++11 equivalent of a perl checked-replacement operation:
my $v = "foo.rat"
if ( $v =~ s/\.rat$/.csv/ )
{
...
}
I can do the replacement without trouble:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string s{ "foo.rat" } ;
std::regex reg{ R"((.*)\.rat$)" } ;
s = std::regex_replace( s, reg, "$1.csv" ) ;
std::cout << s << std::endl ;
s = std::regex_replace( "foo.noo", reg, "$1.csv" ) ;
std::cout << s << std::endl ;
return 0 ;
}
This gives:
foo.csv
foo.noo
Notice that the replace operation on the non-matching expression doesn't throw an error (which is what I expected).
Looking at the regex_replace documentation, it's not obvious to me how to check for the success of the replace operation. I could do a string compare, but that seems backwards?
Try to find match with std::regex_match or std::regex_search, check if something is matched, then replace found portion of string using std::string::replace. That shouldn't lead to performance loss.
Just to add to the accepted answer that it can also be done with a std::regex_iterator. This may be handy when multiple replacements may took place.
Iterator std::regex_iterator repeatedly calls std::regex_search() until all matches are found. If the position of the iterator at the beginning and the position at the end are the same, no match was found.
Function bool regex_replace(std::string &str, const std::string &re, const std::string& replacement) implements this behaviour:
#include <iostream>
#include <regex>
bool regex_replace(std::string &str, const std::string &re, const std::string& replacement) {
std::regex regexp(re);
//Search regex
std::sregex_iterator begin = std::sregex_iterator(str.begin(), str.end(), regexp);
std::sregex_iterator end = std::sregex_iterator();
//replace using iterator
for (std::sregex_iterator i = begin; i != end; ++i)
str.replace(i->position(), i->length(), replacement);
//returns true if at least one match was found and replaced
return (begin != end);
}
This function operates in place. At the end str have the replacements. Only if any replacement was made, the function returns true.
Following code shows how to use it to make multiple replacements and detect if any was made:
int main(int argc, char** argv) {
std::string rgx("[0-9]");
std::string str("0a1b2c3d4e5");
std::string replacement("?");
bool found = regex_replace(str, rgx, replacement);
std::cout << "Found any: " << (found ? "true" : "false") << std::endl;
std::cout << "string: " << str << std::endl;
return 0;
}
The code substitutes every digit for the quotation mark '?':
Found any: true
string: ?a?b?c?d?e?
use std::regex_constants::format_no_copy flag to change the behavior of regex_replace(). look the code below.
the return string will now be empty if match failed.
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string s{ "foo.rat" } ;
std::regex reg{ R"((.*)\.rat$)" } ;
auto rxMatchFlag = std::regex_constants::format_no_copy; //<---use this to modify the behavior of regex_replace when matching failed.
s = std::regex_replace( s, reg, "$1.csv", rxMatchFlag) ;
if(!s.empty()) std::cout << s << std::endl ;
else std::cout << "failed match" << std::endl;
s = std::regex_replace( "foo.noo", reg, "$1.csv", rxMatchFlag) ;
if(!s.empty()) std::cout << s << std::endl ;
else std::cout << "failed match" << std::endl;
return 0 ;
}
for the other flags, look them here
I don't believe there's any direct way to find out whether any replacements were made.
(Don't confuse this with "success / not success", which is not quite the same thing.)
For this string [268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]
I want to get the numbers in [ , ] pair by pair.
I use c++ regex_search to parse this string.
This is my testing code:
ifstream file("output.txt");
char regex_base[] = "[\\[0-9, 0-9\\]]{10}";
char regex_num[] = "[0-9]{3}";
regex reg_base(regex_base, regex_constants::icase);
regex reg_num(regex_base, regex_constants::icase);
if (file.is_open())
{
string s;
while (!file.eof()){
getline(file, s);
smatch m;
while (regex_search(s, m, reg_num)) {
for (int i = 0; i < m.size(); i++)
cout << m[i] << endl;
}
}
}
But in the while of regex_search(), the variable m only get the[268, 950] and it make a infinity loop.
What's wrong in my regular expression or my code?
I have removed the capturing groups since you seem not to be using them anyway, and added some code to just show how to obtain the matches from your input string:
char regex_base[] = "\\[[0-9]+, [0-9]+\\]";
...
s = "[268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]"; // FOR TEST
smatch m;
while (regex_search(s, m, reg_num))
{
for (auto x:m) std::cout << x << "\r\n";
s = m.suffix().str();
}
Output:
If you need the values, you can use a different regex:
char regex_base[] = "\\[([0-9]+), ([0-9]+)\\]";
...
s = "[268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]";
smatch m;
while (regex_search(s, m, reg_num))
{
std::cout << m[1] << ", " << m[2] << std::endl;
s = m.suffix().str();
}
For example, If I have a string like "first second third forth" and I want to match every single word in one operation to output them one by one.
I just thought that "(\\b\\S*\\b){0,}" would work. But actually it did not.
What should I do?
Here's my code:
#include<iostream>
#include<string>
using namespace std;
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
regex_search(str, res, exp);
cout << res[0] <<" "<<res[1]<<" "<<res[2]<<" "<<res[3]<< endl;
}
Simply iterate over your string while regex_searching, like this:
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
string::const_iterator searchStart( str.cbegin() );
while ( regex_search( searchStart, str.cend(), res, exp ) )
{
cout << ( searchStart == str.cbegin() ? "" : " " ) << res[0];
searchStart = res.suffix().first;
}
cout << endl;
}
This can be done in regex of C++11.
Two methods:
You can use () in regex to define your captures(sub expressions).
Like this:
string var = "first second third forth";
const regex r("(.*) (.*) (.*) (.*)");
smatch sm;
if (regex_search(var, sm, r)) {
for (int i=1; i<sm.size(); i++) {
cout << sm[i] << endl;
}
}
See it live: http://coliru.stacked-crooked.com/a/e1447c4cff9ea3e7
You can use sregex_token_iterator():
string var = "first second third forth";
regex wsaq_re("\\s+");
copy( sregex_token_iterator(var.begin(), var.end(), wsaq_re, -1),
sregex_token_iterator(),
ostream_iterator<string>(cout, "\n"));
See it live: http://coliru.stacked-crooked.com/a/677aa6f0bb0612f0
sregex_token_iterator appears to be the ideal, efficient solution, but the example given in the selected answer leaves much to be desired. Instead, I found some great examples here:
http://www.cplusplus.com/reference/regex/regex_token_iterator/regex_token_iterator/
For your convenience, I've copy-pasted the sample code shown by that page. I claim no credit for the code.
// regex_token_iterator example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
// default constructor = end-of-sequence:
std::regex_token_iterator<std::string::iterator> rend;
std::cout << "entire matches:";
std::regex_token_iterator<std::string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) std::cout << " [" << *a++ << "]";
std::cout << std::endl;
std::cout << "2nd submatches:";
std::regex_token_iterator<std::string::iterator> b ( s.begin(), s.end(), e, 2 );
while (b!=rend) std::cout << " [" << *b++ << "]";
std::cout << std::endl;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::regex_token_iterator<std::string::iterator> c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
std::cout << "matches as splitters:";
std::regex_token_iterator<std::string::iterator> d ( s.begin(), s.end(), e, -1 );
while (d!=rend) std::cout << " [" << *d++ << "]";
std::cout << std::endl;
return 0;
}
Output:
entire matches: [subject] [submarine] [subsequence]
2nd submatches: [ject] [marine] [sequence]
1st and 2nd submatches: [sub] [ject] [sub] [marine] [sub] [sequence]
matches as splitters: [this ] [ has a ] [ as a ]
You could use the suffix() function, and search again until you don't find a match:
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
while (regex_search(str, res, exp)) {
cout << res[0] << endl;
str = res.suffix();
}
}
My code will capture all groups in all matches:
vector<vector<string>> U::String::findEx(const string& s, const string& reg_ex, bool case_sensitive)
{
regex rx(reg_ex, case_sensitive ? regex_constants::icase : 0);
vector<vector<string>> captured_groups;
vector<string> captured_subgroups;
const std::sregex_token_iterator end_i;
for (std::sregex_token_iterator i(s.cbegin(), s.cend(), rx);
i != end_i;
++i)
{
captured_subgroups.clear();
string group = *i;
smatch res;
if(regex_search(group, res, rx))
{
for(unsigned i=0; i<res.size() ; i++)
captured_subgroups.push_back(res[i]);
if(captured_subgroups.size() > 0)
captured_groups.push_back(captured_subgroups);
}
}
captured_groups.push_back(captured_subgroups);
return captured_groups;
}
My reading of the documentation is that regex_search searches for the first match and that none of the functions in std::regex do a "scan" as you are looking for. However, the Boost library seems to be support this, as described in C++ tokenize a string using a regular expression
How can I split a string with Boost with a regex AND have the delimiter included in the result list?
for example, if I have the string "1d2" and my regex is "[a-z]" I want the results in a vector with (1, d, 2)
I have:
std::string expression = "1d2";
boost::regex re("[a-z]");
boost::sregex_token_iterator i (expression.begin (),
expression.end (),
re);
boost::sregex_token_iterator j;
std::vector <std::string> splitResults;
std::copy (i, j, std::back_inserter (splitResults));
Thanks
I think you cannot directly extract the delimiters using boost::regex. You can, however, extract the position where the regex is found in your string:
std::string expression = "1a234bc";
boost::regex re("[a-z]");
boost::sregex_iterator i(
expression.begin (),
expression.end (),
re);
boost::sregex_iterator j;
for(; i!=j; ++i) {
std::cout << (*i).position() << " : " << (*i) << std::endl;
}
This example would show:
1 : a
5 : b
6 : c
Using this information, you can extract the delimitiers from your original string:
std::string expression = "1a234bc43";
boost::regex re("[a-z]");
boost::sregex_iterator i(
expression.begin (),
expression.end (),
re);
boost::sregex_iterator j;
size_t pos=0;
for(; i!=j;++i) {
std::string pre_delimiter = expression.substr(pos, (*i).position()-pos);
std::cout << pre_delimiter << std::endl;
std::cout << (*i) << std::endl;
pos = (*i).position() + (*i).size();
}
std::string last_delimiter = expression.substr(pos);
std::cout << last_delimiter << std::endl;
This example would show:
1
a
234
b
c
43
There is an empty string betwen b and c because there is no delimiter.