c++ regex_replace not doing intended substitution

c++ regex_replace not doing intended substitution - c++

The following code is intended to convert the )9 in the first line to a )*9.
The original string is printed unmodified by the last line.
std::string ss ("1 + (3+2)9 - 2 ");
std::regex ee ("(\\)\\d)([^ ]");
std::string result;
std::regex_replace (std::back_inserter(result), ss.begin(), ss.end(), ee, ")*$2");
std::cout << result;
This is based on a very similar example at: http://www.cplusplus.com/reference/regex/regex_replace/
MS Visual Studio Express 2013.

I see two issues: first, your capture group should only include the '9' portion of the string, and second the group you want to use for replacement is not $2, but $1:
std::string ss ("1 + (3+2)9 - 2 ");
static const std::regex ee ("\\)(\\d)");
std::string result;
std::regex_replace (std::back_inserter(result), ss.begin(), ss.end(), ee, ")*$1");
std::cout << result;
Output:
1 + (3+2)*9 - 2
Live Demo
Edit
It appears that you want a more general replacement.
That is, wherever there is a number followed by an open paren, e.g 1( or a close paren followed by a number, e.g. )1. You want an asterisk between the number and the paren.
In C++ we can do this with regex_replace, but we need two of them at this time of writing. We can kind of chain them together:
std::string ss ("1 + 7(3+2)9 - 2");
static const std::regex ee ("\\)(\\d+)");
static const std::regex e2 ("(\\d+)\\(");
std::string result;
std::regex_replace (std::back_inserter(result), ss.begin(), ss.end(), ee, ")*$1");
result = std::regex_replace (result, e2, "$1*(");
std::cout << result;
Output:
1 + 7*(3+2)*9 - 2
Live Demo2
Edit 2
Since you asked in another question how to turn this into one that can also capture spaces, here is a slight modification to handle possible spaces between the number and paren chars:
static const std::regex ee ("\\)\\s*(\\d+)");
static const std::regex e2 ("(\\d+)\\s*\\(");
Live Demo3

Related

tokenize a c++ string with regex having special characters

I am trying to find the tokens in a string, which has words, numbers, and special chars. I tried the following code:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string str("The ,quick brown. fox \"99\" named quick_joe!");
regex reg("[\\s,.!\"]+");
sregex_token_iterator iter(str.begin(), str.end(), reg, -1), end;
vector<string> vec(iter, end);
for (auto a : vec) {
cout << a << ":";
}
cout << endl;
}
And got the following output:
The:quick:brown:fox:99:named:quick_joe:
But I wanted the output:
The:,:quick:brown:.:fox:":99:":named:quick_joe:!:
What regex should I use for that? I would like to stick to the standard c++ if possible, ie I would not like a solution with boost.
(See 43594465 for a java version of this question, but now I am looking for a c++ solution. So essentially, the question is how to map Java's Matcher and Pattern to C++.)

You're asking to interleave non-matched substrings (submatch -1) with the whole matched substrings (submatch 0), which is slightly different:
sregex_token_iterator iter(str.begin(), str.end(), reg, {-1,0}), end;
This yields:
The: ,:quick: :brown:. :fox: ":99:" :named: :quick_joe:!:
Since you're looking to just drop whitespace, change the regex to consume surrounding whitespace, and add a capture group for the non-whitespace chars. Then, just specify submatch 1 in the iterator, instead of submatch 0:
regex reg("\\s*([,.!\"]+)\\s*");
sregex_token_iterator iter(str.begin(), str.end(), reg, {-1,1}), end;
Yields:
The:,:quick brown:.:fox:":99:":named quick_joe:!:
Splitting the spaces between adjoining words requires splitting on 'just spaces' too:
regex reg("\\s*\\s|([,.!\"]+)\\s*");
However, you'll end up with empty submatches:
The:::,:quick::brown:.:fox:::":99:":named::quick_joe:!:
Easy enough to drop those:
regex reg("\\s*\\s|([,.!\"]+)\\s*");
sregex_token_iterator iter(str.begin(), str.end(), reg, {-1,1}), end;
vector<string> vec;
copy_if(iter, end, back_inserter(vec), [](const string& x) { return x.size(); });
Finally:
The:,:quick:brown:.:fox:":99:":named:quick_joe:!:

If you want to use the approach used in the Java related question, just use a matching approach here, too.
regex reg(R"(\d+|[^\W\d]+|[^\w\s])");
sregex_token_iterator iter(str.begin(), str.end(), reg), end;
vector<string> vec(iter, end);
See the C++ demo. Result: The:,:quick:brown:.:fox:":99:":named:quick_joe:!:. Note this won't match Unicode letters here as \w (\d, and \s, too) is not Unicode aware in an std::regex.
Pattern details:
\d+ - 1 or more digits
| - or
[^\W\d]+ - 1 or more ASCII letters or _
| - or
[^\w\s] - 1 char other than an ASCII letter/digit,_ and whitespace.

c++ regex expression error (unhandled exception)

I am trying to select the )9 in the string ss,
to replace it with )*9
I am getting an unhandled expression error at the 2nd line (def of ee).
I have tried all the combinations of line 2 I can think of (including double escaping the d).
std::string ss ("1 + (3+2)9 - 2 ");
std::regex ee ("(\\)\d)([^ ]");
std::string result;
std::regex_replace (std::back_inserter(result), ss.begin(), ss.end(), ee, "*$2");
std::cout << result;

You need to escape all your backslashes:
"(\\)\\d)([^ ]"
Otherwise you could be sending a control character to the regex, not the two characters \ and d.
Also, you either need a regex backslash for the second (:
"(\\)\\d)\\([^ ]"
or you need to add a matching close-parenthesis:
"(\\)\\d)([^ ])"

C++ Regex: non-greedy match

I'm currently trying to make a regex which matches URL parameters and extracts them.
For example, if I got the following parameters string ?param1=someValue&param2=someOtherValue, std::regex_match should extract the following contents:
param1
some_content
param2
some_other_content
After trying different regex patterns, I finally built one corresponding to what I want: std::regex("(?:[\\?&]([^=&]+)=([^=&]+))*").
If I take the previous example, std::regex_match matches as expected. However, it does not extract the expected values, keeping only the last captured values.
For example, the following code:
std::regex paramsRegex("(?:[\\?&]([^=&]+)=([^=&]+))*");
std::string arg = "?param1=someValue&param2=someOtherValue";
std::smatch sm;
std::regex_match(arg, sm, paramsRegex);
for (const auto &match : sm)
std::cout << match << std::endl;
will give the following output:
param2
someOtherValue
As you can see, param1 and its value are skipped and not captured.
After searching on google, I've found that this is due to greedy capture and I have modified my regex into "(?:[\\?&]([^=&]+)=([^=&]+))\\*?" in order to enable non-greedy capturing.
This regex works well when I try it on rubular but it does not match when I use it in C++ (std::regex_match returns false and nothing is captured).
I've tried different std::regex_constants options (different regex grammar by using std::regex_constants::grep, std::regex_constants::egrep, ...) but the result is the same.
Does someone know how to do non-greedy regex capture in C++?

As Casimir et Hippolyte explained in his comment, I just need to:
remove the quantifier
Use std::regex_iterator
It gives me the following code:
std::regex paramsRegex("[\\?&]([^=]+)=([^&]+)");
std::string url_params = "?key1=val1&key2=val2&key3=val3&key4=val4";
std::smatch sm;
auto params_it = std::sregex_iterator(url_params.cbegin(), url_params.cend(), paramsRegex);
auto params_end = std::sregex_iterator();
while (params_it != params_end) {
auto param = params_it->str();
std::regex_match(param, sm, paramsRegex);
for (const auto &s : sm)
std::cout << s << std::endl;
++params_it;
}
And here is the output:
?key1=val1
key1
val1
&key2=val2
key2
val2
&key3=val3
key3
val3
&key4=val4
key4
val4
The orignal regex (?:[\\?&]([^=&]+)=([^=&]+))* was just changed into [\\?&]([^=]+)=([^&]+).
Then, by using std::sregex_iterator, I get an iterator on each matching groups (?key1=val1, &key2=val2, ...).
Finally, by calling std::regex_match on each sub-string, I can retrieve parameters values.

Try to use match_results::prefix/suffix:
string match_expression("your expression");
smatch result;
regex fnd(match_expression, regex_constants::icase);
while (regex_search(in_str, result, fnd, std::regex_constants::match_any))
{
for (size_t i = 1; i < result.size(); i++)
{
std::cout << result[i].str();
}
in_str = result.suffix();
}

Extract numbers from string (Regex C++)

let's say i hve a string S = "1 this is a number=200; Val+54 4class find57"
i want to use Regex to extract only this numbers:
num[1] = 1
num[2] = 200
num[3] = 54
and not the 4 in "4class" or 57 in "find57" which means only numbers that are surrounded by Operators or space.
i tried this code but no results:
std::string str = "1 this is a number=200; Val+54 4class find57";
boost::regex re("(\\s|\\-|\\*|\\+|\\/|\\=|\\;|\n|$)([0-9]+)(\\s|\\-|\\*|\\+|\\/|\\;|\n|$)");
boost::sregex_iterator m1(str.begin(), str.end(), re);
boost::sregex_iterator m2;
for (; m1 != m2; ++m1) {
advm1->Lines->Append((*m1)[1].str().c_str());
}
by the way i'am using c++ Builder XE6.

Just use word boundaries. \b matches between a word character and a non-word character.
\b\d+\b
OR
\b[0-9]+\b
DEMO
Escape the backslash one more time if necessary like \\b\\d+\\b or \\b[0-9]+\\b

c++ regex substring wrong pattern found

I'm trying to understand the logic on the regex in c++
std::string s ("Ni Ni Ni NI");
std::regex e ("(Ni)");
std::smatch sm;
std::regex_search (s,sm,e);
std::cout << "string object with " << sm.size() << " matches\n";
This form shouldn't give me the number of substrings matching my pattern? Because it always give me 1 match and it says that the match is [Ni , Ni]; but i need it to find every single pattern; they should be 3 and like this [Ni][Ni][Ni]

The function std::regex_search only returns the results for the first match found in your string.
Here is a code, merged from yours and from cplusplus.com. The idea is to search for the first match, analyze it, and then start again using the rest of the string (that is to say, the sub-string that directly follows the match that was found, which can be retrieved thanks to match_results::suffix ).
Note that the regex has two capturing groups (Ni*) and ([^ ]*).
std::string s("the knights who say Niaaa and Niooo");
std::smatch m;
std::regex e("(Ni*)([^ ]*)");
while (std::regex_search(s, m, e))
{
for (auto x : m)
std::cout << x.str() << " ";
std::cout << std::endl;
s = m.suffix().str();
}
This gives the following output:
Niaaa Ni aaa
Niooo Ni ooo
As you can see, for every call to regex_search, we have the following information:
the content of the whole match,
the content of every capturing group.
Since we have two capturing groups, this gives us 3 strings for every regex_search.
EDIT: in your case if you want to retrieve every "Ni", all you need to do is to replace
std::regex e("(Ni*)([^ ]*)");
with
std::regex e("(Ni)");
You still need to iterate over your string, though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ regex_replace not doing intended substitution - c++

Related

tokenize a c++ string with regex having special characters

c++ regex expression error (unhandled exception)

C++ Regex: non-greedy match

Extract numbers from string (Regex C++)

c++ regex substring wrong pattern found

Categories

Resources