C++ std::regex How to fix the error_complexity? - regex

I used std::regex to match a string.
My define of regex is :
regex reg("(-?\\d+,?){2,}", regex::icase)
Test string is :
5,3240,7290,11340,-3240,-7290,-11340
I used std function of regex_match().
The following is the error I got.
regex_error(error_complexity): The complexity of an attempted match
against a regular expression exceeded a pre-set level.
How can I fix the problem ? My compiler is VS2013.

You may "unroll" the ,?-containing group into a more linear pattern to reduce complexity - ",?-?\\d+(?:,-?\\d+)+".
See C++ demo:
#include <iostream>
#include <regex>
using namespace std;
int main() {
regex reg(",?-?\\d+(?:,-?\\d+)+");
string s("5,3240,7290,11340,-3240,-7290,-11340");
if (regex_match(s, reg)) {
std::cout << "Matched!";
}
return 0;
}
Now, the pattern matches:
,? - an optional comma
-? - an optional hyphen
\\d+ - 1 or more digits
(?:,-?\\d+)+ - 1 or more sequences matching
, - a comma
-?\\d+ - see above.

Related

RE2 Nested Regex Group Match

I have a RE2 regex as following
const re2::RE2 numRegex("(([0-9]+),)+([0-9])+");
std::string inputStr;
inputStr="apple with make,up things $312,412,3.00");
RE2::Replace(&inputStr, numRegex, "$1$3");
cout << inputStr;
Expected
apple with make,up,things $3124123.00
I was trying to remove the , in the recognized number, $1 would only match 312 but not 412 part. Wondering how to extract the recursive pattern in the group.
Note that RE2 doesn't support lookahead (see Using positive-lookahead (?=regex) with re2) and the solutions I found all use lookaheads.
RE2 based solution
As RE2 does not support lookarounds, there is no pure single-pass regex solution.
You can have a workaround (as usual, when no solution is available): replace the string twice with (\d),(\d) regex and $1$2 substitution:
const re2::RE2 numRegex(R"((\d),(\d))");
std::string inputStr("apple with make,up things $312,412,3.00");
RE2::Replace(&inputStr, numRegex, "$1$2");
RE2::Replace(&inputStr, numRegex, "$1$2"); // <- Second pass to remove commas in 1,2,3,4 like strings
std::cout << inputStr;
C++ std::regex based solution:
You can remove the commas between digits using
std::string inputStr("apple with make,up things $312,412,3.00");
std::regex numRegex(R"((\d),(?=\d))");
std::cout << regex_replace(inputStr, numRegex, "$1") << "\n";
// => apple with make,up things $3124123.00
See the C++ demo. Also, see the regex demo here.
Details:
(\d) - Capturing group 1 ($1): a digit
, - a comma
(?=\d) - a positive lookahead that requires a digit immediately to the right of the current location.
In the pattern that you tried, you are repeating the outer group (([0-9]+),)+ which will then contain the value of the last iteration where it can match a 1+ digits and a comma.
The last iteration will capture 412, and 312, will only be matched.
You are using regex, but as an alternative if you have boost available, you could make use of the \G anchor which can get iterative matches asserting the position at the end of the previous match and replace with an empty string.
(?:\$|\G(?!^))\d+\K,(?=\d)
The pattern matches:
(?: Non capture group
\$ match $
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close non capture group
\d+\K Match 1+ digits and forget what is matched so far
,(?=\d) Match a comma and assert a digit directly to the right
Regex demo
#include<iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;
int main()
{
std::string inputStr = "apple with make,up things $312,412,3.00";
boost::regex numRegex("(?:\\$|\\G(?!^))\\d+\\K,(?=\\d)");
std::string result = boost::regex_replace(inputStr, numRegex, "");
std::cout << result << std::endl;
}
Output
apple with make,up things $3124123.00

How to consider taking dot in the number in regular expression

Take a look at the following regular expression
std::regex reg("[A][-+]?([0-9]*\\.[0-9]+|[0-9]+)");
This will find any A letter followed by float number. The problem if the number A30., this regular expression ignores the dot and print the result as A30. I would like to force the regular expression to consider the decimal dot as well. Is this feasible?
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main()
{
std::string line("A50. hsih Y0 his ");
std::smatch match;
std::regex reg("[A][-+]?([0-9]*\\.[0-9]+|[0-9]+)");
if ( std::regex_search(line,match,reg) ){
cout << match.str(0) << endl;
}else{
cout << "nothing found" << endl;
}
return 0;
}
You request the dot to be followed by one or more (+) digits. Just make the trailing ditigs optional by changing it to:
std::regex reg("[A][-+]?([0-9]*\\.[0-9]*|[0-9]+)");
Demo
The only problem with this expression is that it would also match A followed by a single dot without any digit. I don't know if you'd see this a s a valid match. A more robust alternative would hence be:
std::regex reg("[A][-+]?([0-9]*\\.[0-9]+|[0-9]+\\.?)");
So either trailing digits, or digits followed optionally by a dot.
Second demo
You can change your regex like this
A[-+]?(?:[0-9]*\\.?(?:[0-9]+)?)
A - Matches A.
[-+]? - Matches + or -. ( ? makes it optional)
(?:[0-9]*\\.?(?:[0-9]+)?)
(?:[0-9]*\\. - will match zero or more digits followed by . (? makes it optional)
(?:[0-9]+)? - Matches one or more time. (? makes it optional)
Demo

How to select the complete word within the brackets even if it have that brackets within word

Give some solution to this following example,
Scenario-1:
My String : Password={my_pswd}}123}
I want to select the value enclosed within the {} brackets(Example: I want to select the complete password key value {my_pswd}123} not {my_pswd})
If I'm using this regex \{(.*?)\} , this will select {my_pswd} not {my_pswd}}123}. So how to get complete word even if the word has } in between? Give me some suggestions by using regex or any other way.
Scenario-2:
I am using this regex ^\{|\}$ . If my string have both { bracket and } bracket like this {{my_password}} then only it want to select first and last bracket. If my string like this {{my_password, it don't want to select that starting bracket. Its like AND condition in Regex. I referred many posts they did with look up but I can't get clear idea. Give me some suggestion.
Thanks.
It seems that the {...} substrings you want to match must be followed with ; or end of string.
This will not work for cases when a } inside the values can also be followed with ;.
You may solve the first issue by adding a (?![^;]) lookaround:
\{(.*?)\}(?![^;])
See the regex demo.
Details
\{ - a { char
(.*?) - Group 1: any 0+ chars as few as possible
\} - a } char
(?![^;]) - no char other than ; is allowed right after the current position
See the C++ demo:
#include <iostream>
#include <vector>
#include <regex>
int main() {
const std::regex reg("\\{(.*?)\\}(?![^;])");
std::smatch match;
std::string s = "Username={My_{}user};Password={my_pswd}}123}}}kk};Password={my_pswd}}123}";
std::vector<std::string> results(
std::sregex_token_iterator(s.begin(), s.end(), reg, 1), // See 1, it extracts Group 1 value
std::sregex_token_iterator());
for (auto result : results)
{
std::cout << result << std::endl;
}
return 0;
}
Output:
My_{}user
my_pswd}}123}}}kk
my_pswd}}123
As for the second scenario, you may use
std::regex reg("^\\{([^]*)\\}$");
std::string s = "{My_{}user}";
std::cout << regex_replace(s, reg, "$1") << std::endl; // => My_{}user
See another C++ demo.
The \{([^]*)\}$ pattern matches the { at the start (^) of the string, then matches and captures into Group 1 (later referenced with the help of $1 in the replacement pattern) any 0+ chars, as many as possible, and then matches a } at the end of the string ($).

c++ regexp allowing digits separated by dot

i need rexexp allowing up to two digits in a row separated by dots, like 1.2 or 1.2.3 or 1.2.3.45 etc., but not 1234 or 1.234 etc. I'm trying this "^[\d{1,2}.]+", but it allows all numbers. What's wrong?
You may try this:
^\d{1,2}(\.\d{1,2})+$
Regex 101 Demo
Explanation:
^ start of a string
\d{1,2} followed by one or two digits
( start of capture group
\.\d{1,2} followed by a dot and one or two digits
) end of capture group
+ indicates the previous capture group be repeated 1 or more times
$ end of string
Sample C++ Source (run here):
#include <regex>
#include <string>
#include <iostream>
using namespace std;
int main()
{
string regx = R"(^\d{1,2}(\.\d{1,2})+$)";
string input = "1.2.346";
smatch matches;
if (regex_search(input, matches, regex(regx)))
{
cout<<"match found";
}
else
cout<<"No match found";
return 0;
}
I think the last should not have more than 2 digits.
(\d{1,2}\.)+\d{1,2}(?=\b)

Using RegEx to filter wrong Input?

Look at this example:
string str = "January 19934";
The Outcome should be
Jan 1993
I think I have created the right RegEx ([A-z]{3}).*([\d]{4}) to use in this case but I do not know what I should do now?
How can I extract what I am looking for, using RegEx? Is there a way like receiving 2 variables, the first one being the result of the first RegEx bracket: ([A-z]{3}) and the second result being 2nd bracket:[[\d]{4}]?
Your regex contains a common typo: [A-z] matches more than just ASCII letters. Also, the .* will grab all the string up to its end, and backtracking will force \d{4} match the last 4 digits. You need to use lazy quantifier with the dot, *?.
Then, use regex_search and concat the 2 group values:
#include <regex>
#include <string>
#include <iostream>
using namespace std;
int main() {
regex r("([A-Za-z]{3}).*?([0-9]{4})");
string s("January 19934");
smatch match;
std::stringstream res("");
if (regex_search(s, match, r)) {
res << match.str(1) << " " << match.str(2);
}
cout << res.str(); // => Jan 1993
return 0;
}
See the C++ demo
Pattern explanation:
([A-Za-z]{3}) - Group 1: three ASCII letters
.*? - any 0+ chars other than line break symbols as few as possible
([0-9]{4}) - Group 2: 4 digits
This could work.
([A-Za-z]{3})([a-z ])+([\d]{4})
Note the space after a-z is important to catch space.