Regex error at runtime Visual Studio 2019 [duplicate] - c++

I would like to use regular expression from here:
https://www.rfc-editor.org/rfc/rfc3986#appendix-B
I am trying to compile it like this:
#include <regex.h>
...
regex_t regexp;
if((regcomp(&regexp, "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?", REG_EXTENDED)) != 0){
return SOME_ERROR:
}
But I am stuck with return value of regcomp:
REG_BADRPT
According to man it means:
Invalid use of repetition operators such as using * as the first character.
Similar meaning at this man:
?, * or + is not preceded by valid regular expression
I wrote parser using my own regular expression, but I would like to test this one too, since its officially in rfc. I do no intend to use it for validation though.

As Oli Charlesworth suggested, you need to escape backslash \\ for the question marks \?. See C++ escape sequences for more information.
test program
#include <regex.h>
#include <iostream>
void test_regcomp(char *rx){
regex_t regexp;
if((regcomp(&regexp, rx, REG_EXTENDED)) != 0){
std::cout << "ERROR :" << rx <<"\n";
}
else{
std::cout << " OK :"<< rx <<"\n";
}
}
int main()
{
char *rx1 = "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?" ;
char *rx2 = "^(([^:/\?#]+):)\?(//([^/\?#]*))\?([^\?#]*)(\\\?([^#]*))\?(#(.*))\?" ;
test_regcomp(rx1);
test_regcomp(rx2);
return 0;
}
output
ERROR :^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
OK :^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
The \?in your regex is the source of the REG_BADRPT error. It gets converted to ?. If you replace it by \\?, regcomp will be able to compile your regex.
"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?"
OK :^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Related

Composing complex regular expressions with "DEFINED" subexpressions in C++

I'm trying to write a regular expression in C++ to match a base64 encoded string. I'm quite familiar with writing complex regular expressions in Perl so I started with that:
use strict;
use warnings;
my $base64_regex = qr{
(?(DEFINE)
(?<B64>[A-Za-z0-9+/])
(?<B16>[AEIMQUYcgkosw048])
(?<B04>[AQgw])
)
^(
((?&B64){4})*
(
(?&B64){4}|
(?&B64){2}(?&B16)=|
(?&B64)(?&B04)={2}
)
)?$}x;
# "Hello World!" base64 encoded
my $base64 = "SGVsbG8gV29ybGQh";
if ($base64 =~ $base64_regex)
{
print "Match!\n";
}
else
{
print "No match!\n"
}
Output:
Match!
I then tried to implement a similar regular expression in C++:
#include <iostream>
#include <regex>
int main()
{
std::regex base64_regex(
"(?(DEFINE)"
"(?<B64>[A-Za-z0-9+/])"
"(?<B16>[AEIMQUYcgkosw048])"
"(?<B04>[AQgw])"
")"
"^("
"((?&B64){4})*"
"("
"(?&B64){4}|"
"(?&B64){2}(?&B16)=|"
"(?&B64)(?&B04)={2}"
")"
")?$");
// "Hello World!" base64 encoded
std::string base64 = "SGVsbG8gV29ybGQh";
if (std::regex_match(base64, base64_regex))
{
std::cout << "Match!" << std::endl;
}
else
{
std::cout << "No Match!" << std::endl;
}
}
but when I run the code I get an exception telling me it is not a valid regular expression.
Catching the exception and printing the "what" string doesn't help much either. All it gives me is the following:
regex_error(error_syntax)
Obviously I could get rid of the "DEFINE" block with my pre-defined subpatterns, but that would make the whole expression very difficult to read... and, well... I like to be able to maintain my own code when I come back to it a few years later lol so that isn't really a good option.
How can I get a similar regular expression to work in C++?
Note: This must all be done within a single "std::regex" object because I am writing a library where users will be able to pass a string to be able to define their own regular expressions and I want these users to be able to "DEFINE" similar subexpressions within their regex if they need to.
How about string concatenation?
#define B64 "[A-Za-z0-9+/]"
#define B16 "[AEIMQUYcgkosw048]"
#define B04 "[AQgw]"
std::regex base64_regex(
"^("
"(" B64 "{4})*"
"("
B64 "{4}|"
B64 "{2}" B16 "=|"
B64 B04 "={2}"
")"
")?$");
I took a suggestion from the comments and checked out "boost" regex since it supports "Perl" regular expressions. I gave it a try and it worked great!
#include <boost/regex.hpp>
boost::regex base64_regex(
"(?(DEFINE)"
"(?<B64>[A-Za-z0-9+/])"
"(?<B16>[AEIMQUYcgkosw048])"
"(?<B04>[AQgw])"
")"
"("
"((?&B64){4})*"
"("
"(?&B64){4}|"
"(?&B64){2}(?&B16)=|"
"(?&B64)(?&B04)={2}"
")"
")?", boost::regex::perl);

Why does std::regex_match not support "zero-length assertions"?

#include <regex>
int main()
{
b = std::regex_match("building", std::regex("^\w*uild(?=ing$)"));
//
// b is expected to be true, but the actual value is false.
//
}
My compiler is clang 3.8.
Why does std::regex_match not support "zero-length assertions"?
regex_match is only for matching the entire input string. Your regex — written correctly as "^\\w*uild(?=ing$) with the backslash escaped, or as a raw string R"(^\w*uild(?=ing$))" — only actually matches (consumes) the prefix build. It looks ahead for ing$, and will successfully find it, but since the whole input string wasn't consumed, regex_match rejects the match.
If you want to use regex_match but only capture the first part, you could use ^(\w*uild)ing$ (or just (\w*uild)ing since the whole string must be matched) and access the 1st capture group.
But since you're using ^ and $ anyway, you might as well use regex_search instead:
int main()
{
std::cmatch m;
if (std::regex_search("building", m, std::regex(R"(^\w*uild(?=ing$))"))) {
std::cout << "m[0] = " << m[0] << std::endl; // prints "m[0] = build"
}
return 0;
}

In C++ STL, how do I remove non-numeric characters from std::string with regex_replace?

Using the C++ Standard Template Library function regex_replace(), how do I remove non-numeric characters from a std::string and return a std::string?
This question is not a duplicate of question
747735
because that question requests how to use TR1/regex, and I'm
requesting how to use standard STL regex, and because the answer given
is merely some very complex documentation links. The C++ regex
documentation is extremely hard to understand and poorly documented,
in my opinion, so even if a question pointed out the standard C++
regex_replace
documentation,
it still wouldn't be very useful to new coders.
// assume #include <regex> and <string>
std::string sInput = R"(AA #-0233 338982-FFB /ADR1 2)";
std::string sOutput = std::regex_replace(sInput, std::regex(R"([\D])"), "");
// sOutput now contains only numbers
Note that the R"..." part means raw string literal and does not evaluate escape codes like a C or C++ string would. This is very important when doing regular expressions and makes your life easier.
Here's a handy list of single-character regular expression raw literal strings for your std::regex() to use for replacement scenarios:
R"([^A-Za-z0-9])" or R"([^A-Za-z\d])" = select non-alphabetic and non-numeric
R"([A-Za-z0-9])" or R"([A-Za-z\d])" = select alphanumeric
R"([0-9])" or R"([\d])" = select numeric
R"([^0-9])" or R"([^\d])" or R"([\D])" = select non-numeric
Regular expressions are overkill here.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
inline bool not_digit(char ch) {
return '0' <= ch && ch <= '9';
}
std::string remove_non_digits(const std::string& input) {
std::string result;
std::copy_if(input.begin(), input.end(),
std::back_inserter(result),
not_digit);
return result;
}
int main() {
std::string input = "1a2b3c";
std::string result = remove_non_digits(input);
std::cout << "Original: " << input << '\n';
std::cout << "Filtered: " << result << '\n';
return 0;
}
The accepted answer if fine for the specifics of the given sample.
But it will fail for a number such as "-12.34" (it would result in "1234").
(note how the sample could be negative numbers)
Then the regex should be:
(-|\+)?(\d)+(.(\d)+)*
explanation: (optional ( "-" or "+" )) with (a number, repeated 1 to n times) with (optionally end's with: ( a "." followed by (a number, repeated 1 to n times) )
A bit over-reaching, but I was looking for this and the page showed up first in my search, so I'm adding my answer for future searches.

regex visual studio

I was planning to use the following regex to capture path and name of a file:
std::regex capture_path_name_file("(.+)\\([^\\]+)\\.[^\\]+$");
but when running (i'm using visual studio) i get the regex error
error_brack: the expression contained mismatched [ and ]
Trying to pinpoint the cause i tried the following regex:
std::regex test("[^\\]")
and I got the same error.
I have tested my regex in regex101.com (with the slight difference that i had to use \. instead of \\.)
Thanks for any help.
The issue you have is because \\ is treated as 1 literal \ symbol in regular string literals. Biffen explained it well in his comment, [^\\] is treated as [^\], the ] is treated as a literal ] and not the closing character class delimiter (and there is no matching ] to close the character class further).
The right answer is: use _splitpath_s.
And if you want to further play with regex, you can fix it like this:
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex rex1(R"((.+?)([^\\.]+\.[^\\.]+)$)");
std::smatch m;
std::string str = "c:\\Python27\\REGEX\\test_regex.py";
if (regex_search(str, m, rex1)) {
std::cout << "Path: " << m[1] << std::endl;
std::cout << "File name: " << m[2] << std::endl;
}
return 0;
}
Using raw string literals, you can avoid the majority of issues related to escaping. Use R"((.+?)([^\\.]+\.[^\\.]+)$)", it will match and capture into Group 1 the file folder path, and it will capture into Group 2 the file name with extension. Note that the extension must be present.

How to match absolute value using regex

I am having trouble with absolute value in regex in C++. This is what I have as the pattern:
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|"); // load -|M(x)|
I am trying to use std::tr1::regex_match( IR, result, loadNM ) to match. But it is not matching anything, even though it should be.
I'm using Visual Stuido 2010 compilier
shortened version of program (included above is iostream and regex)
int main()
{
std::string IR = "load -|M(x)|";
std::smatch result;
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|");
if( std::tr1::regex_match( IR , result, loadAbsNM ) )
{
int x = 2;
std::cout << "matched!" << std::endl;
}
else
{
std::cout << "!UNABLE TO DECODE INSTRUCTION!" << std::endl;
}
}
output produced
!UNABLE TO DECODE INSTRUCTION!
Note that from your code, you're not going to have a match. The letter x won't match the regex \d+.
Also, I'm not too sure whether you need a backslash in front of the pipe character. As you may know, pipe (|) is used to separate possible entries: (a|b) means a or b.
Finally, since their is a pipe at the end, the expression matches the empty string which is often a bad idea.
I would suggest something like this:
"load -\\|M\\((\\d+)\\)\\|"
But that won't match:
"load -|M(x)|"
You'd need to use a number instead of 'x' as in:
"load -|M(123)|"