I have a little bit of a problem with a C++11 RegEx and I think it is about greedynes.
Here is a little sample.
#include <stdio.h>
#include <string>
#include <regex>
int main (void)
{
std::string in="{ab}{cd}[ef]{gh}[ij][kl]"; // the input-string
std::regex rx1 ("(\\{.+?})(.*)", std::regex::extended); // non-greedy?
std::smatch match;
if (regex_match (in, match, rx1))
{
printf ("\n%s\n", match.str(1).c_str());
}
return 0;
}
I would expect
{ab}
for output.
But I got
{ab}{cd}[ef]{gh}
I would expect the result I get, if I do it greedy but not with the ? after the .+.
Should make it non-greedy, right?
So whats the problem in my idea?
Thanks for help!
Chris
You need to remove the std::regex::extended, it makes your regex POSIX ERE compliant, and that regex flavor does not support lazy quantifiers.
std::regex rx1("(\\{.+?})(.*)");
See the C++ demo
Related
This question already has answers here:
Trouble with C++ Regex POSIX character class
(3 answers)
Closed 4 years ago.
from what i researched, the expression "[:alpha:]" will be matched for any alphabetic character, but the expression only match for lowercase character and not uppercase character. I not sure what's wrong with it.
std::regex e ("[:alpha:]");
if(std::regex_match("A",e))
std::cout<<"hi";
else
std::cout<<"no";
Change this:
std::regex e ("[:alpha:]");
to:
std::regex e ("[[:alpha:]]");
As Adrian stated: Please note that the brackets in the class names are additional to those opening and closing the class definition. For example: [[:alpha:]] is a character class that matches any alphabetic character. Read more in the ref.
You have to use [[:alpha:]]
see online example
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::regex e ("[[:alpha:]]");
if(std::regex_match("A",e))
std::cout<<"hi";
else
std::cout<<"no";
return 0;
}
Can anyone more familiar with gcc point out why the sample below fails to match on gcc 4.9.2 but succeeds on gcc 5.3? Is there anything I can do to alternate the pattern so that it would work (also seems to work fine on VS 2013)?
#include <iostream>
#include <regex>
std::regex pattern("HTTP/(\\d\\.\\d)\\s(\\d{3})\\s(.*)\\r\\n(([!#\\$%&\\*\\+\\-\\./a-zA-Z\\^_`\\|-]+\\:[^\\r]+\\r\\n)*)\\r\\n");
const char* test = "HTTP/1.1 200 OK\r\nHost: 192.168.1.72:8080\r\nContent-Length: 86\r\n\r\n";
int main()
{
std::cmatch results;
bool matched = std::regex_search(test, test + strlen(test), results, pattern);
std::cout << matched;
return 0;
}
I assume I am using something that is not supported in gcc 4.9.2 but was added on or fixed later, but I have no idea where to look it up.
UPDATE
Due to the amount of help and suggestions I tried to backtrack the issue instead of just switching to gcc 5. I get correct matches with this modification:
#include <iostream>
#include <regex>
std::regex pattern("HTTP/(\\d\\.\\d)\\s(\\d{3})\\s(.*?)\\r\\n(?:([^:]+\\:[^\\r]+\\r\\n)*)\\r\\n");
const char* test = "HTTP/1.1 200 OK\r\nHost: 192.168.1.72:8080\r\nContent-Length: 86\r\n\r\n";
int main()
{
std::cmatch results;
bool matched = std::regex_search(test, test + strlen(test), results, pattern);
std::cout << matched << std::endl;
if (matched)
{
for (const auto& result : results)
{
std::cout << "matched: " << result.str() << std::endl;
}
}
return 0;
}
So I guess the problem is with the group that matches the HTTP header name. Will check further.
UPDATE 2
std::regex pattern(R"(HTTP/(\d\.\d)\s(\d{3})\s(.*?)\r\n(?:([!#$&a-zA-Z^_`|-]+\:[^\r]+\r\n)*)\r\n)")
is the last thing that works. Adding any of the remaining characters that I had in my group - %*+-. (escaped or not epscaped) - breaks it.
So I know GCC did not support the c++11 regex library until GCC 4.9 officially. See Is gcc 4.8 or earlier buggy about regular expressions?. Since it was so new, it is likely that it had a few bugs to smooth out. Pinning down the exact cause would be difficult, but the problem is in the implementation and not in the regex.
Side note: I remember spending 20 minutes one time trying to figure out what was wrong with my regex when I found the mentioned article and realized that I was using gcc 4.8.*. Since the machine I had to run on wasn't mine, I basically ended up compiling on a different, similar platform with a later version of gcc and a few hacks and then it ran on the target platform.
According to this reference, I should be able to match a single digit with
std::regex e1 ("\\d");
However, when I run the following test code I get a regex exception.
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::regex r("\\d");
std::string s("9");
if (std::regex_match(s, r)) { std::cout << "matched!" << std::endl; }
}
GCC's std::regex support is not yet ready for prime time. See: Is gcc 4.8 or earlier buggy about regular expressions?
If std::regex support is still buggy as #qwrrty suggests, the character class '[0-9]' is a substitute for '\d'.
I'm trying to write a string parser that uses the standard library methods in C++. I want to parse out of an incoming string substrings that end with a newline or a ';'. I keep getting exceptions from the regex object that I create. My pattern is:
string pattern = "(.+[\\n\\r;])";
regex cmd_sep(pattern);
I've tried it with and without the regex_constants::extended or basic flags.
You'd better post your error message, if you are using boost library. It is possible you've missed boost::regex tag.
Try this
#include <boost/regex.hpp>
#include <string>
using namespace std;
int main ()
{
string pattern = "(.+[\\n\\r;])";
static const boost::regex cmd_sep(pattern);
return 0;
}
Need help figuring out how to extract text from context (Honda from str), need something analogous to Perl regex
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
string str;
str = "<make>Honda</make>";
//Code to extract Honda from above string
cout<<str<<endl;
cin.get();
return 0;
}
need something analogous to Perl regex
Is this a trick question? :) That "something" is PCRE: "Perl-Compatible Regular Expressions".
What you really need is libxml2, and the XPath query //meta/text().
In C# (I don't know programming in C#), I know there is Regex but in C++ it may be included in external libraries