C++11 RegEx, non-greedy - c++

I have a little bit of a problem with a C++11 RegEx and I think it is about greedynes.
Here is a little sample.
#include <stdio.h>
#include <string>
#include <regex>
int main (void)
{
std::string in="{ab}{cd}[ef]{gh}[ij][kl]"; // the input-string
std::regex rx1 ("(\\{.+?})(.*)", std::regex::extended); // non-greedy?
std::smatch match;
if (regex_match (in, match, rx1))
{
printf ("\n%s\n", match.str(1).c_str());
}
return 0;
}
I would expect
{ab}
for output.
But I got
{ab}{cd}[ef]{gh}
I would expect the result I get, if I do it greedy but not with the ? after the .+.
Should make it non-greedy, right?
So whats the problem in my idea?
Thanks for help!
Chris

You need to remove the std::regex::extended, it makes your regex POSIX ERE compliant, and that regex flavor does not support lazy quantifiers.
std::regex rx1("(\\{.+?})(.*)");
See the C++ demo

Related

C++ regex character class not matching [duplicate]

This question already has answers here:
Trouble with C++ Regex POSIX character class
(3 answers)
Closed 4 years ago.
from what i researched, the expression "[:alpha:]" will be matched for any alphabetic character, but the expression only match for lowercase character and not uppercase character. I not sure what's wrong with it.
std::regex e ("[:alpha:]");
if(std::regex_match("A",e))
std::cout<<"hi";
else
std::cout<<"no";
Change this:
std::regex e ("[:alpha:]");
to:
std::regex e ("[[:alpha:]]");
As Adrian stated: Please note that the brackets in the class names are additional to those opening and closing the class definition. For example: [[:alpha:]] is a character class that matches any alphabetic character. Read more in the ref.
You have to use [[:alpha:]]
see online example
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::regex e ("[[:alpha:]]");
if(std::regex_match("A",e))
std::cout<<"hi";
else
std::cout<<"no";
return 0;
}

Differences in regex support between gcc 4.9.2 and gcc 5.3

Can anyone more familiar with gcc point out why the sample below fails to match on gcc 4.9.2 but succeeds on gcc 5.3? Is there anything I can do to alternate the pattern so that it would work (also seems to work fine on VS 2013)?
#include <iostream>
#include <regex>
std::regex pattern("HTTP/(\\d\\.\\d)\\s(\\d{3})\\s(.*)\\r\\n(([!#\\$%&\\*\\+\\-\\./a-zA-Z\\^_`\\|-]+\\:[^\\r]+\\r\\n)*)\\r\\n");
const char* test = "HTTP/1.1 200 OK\r\nHost: 192.168.1.72:8080\r\nContent-Length: 86\r\n\r\n";
int main()
{
std::cmatch results;
bool matched = std::regex_search(test, test + strlen(test), results, pattern);
std::cout << matched;
return 0;
}
I assume I am using something that is not supported in gcc 4.9.2 but was added on or fixed later, but I have no idea where to look it up.
UPDATE
Due to the amount of help and suggestions I tried to backtrack the issue instead of just switching to gcc 5. I get correct matches with this modification:
#include <iostream>
#include <regex>
std::regex pattern("HTTP/(\\d\\.\\d)\\s(\\d{3})\\s(.*?)\\r\\n(?:([^:]+\\:[^\\r]+\\r\\n)*)\\r\\n");
const char* test = "HTTP/1.1 200 OK\r\nHost: 192.168.1.72:8080\r\nContent-Length: 86\r\n\r\n";
int main()
{
std::cmatch results;
bool matched = std::regex_search(test, test + strlen(test), results, pattern);
std::cout << matched << std::endl;
if (matched)
{
for (const auto& result : results)
{
std::cout << "matched: " << result.str() << std::endl;
}
}
return 0;
}
So I guess the problem is with the group that matches the HTTP header name. Will check further.
UPDATE 2
std::regex pattern(R"(HTTP/(\d\.\d)\s(\d{3})\s(.*?)\r\n(?:([!#$&a-zA-Z^_`|-]+\:[^\r]+\r\n)*)\r\n)")
is the last thing that works. Adding any of the remaining characters that I had in my group - %*+-. (escaped or not epscaped) - breaks it.
So I know GCC did not support the c++11 regex library until GCC 4.9 officially. See Is gcc 4.8 or earlier buggy about regular expressions?. Since it was so new, it is likely that it had a few bugs to smooth out. Pinning down the exact cause would be difficult, but the problem is in the implementation and not in the regex.
Side note: I remember spending 20 minutes one time trying to figure out what was wrong with my regex when I found the mentioned article and realized that I was using gcc 4.8.*. Since the machine I had to run on wasn't mine, I basically ended up compiling on a different, similar platform with a later version of gcc and a few hacks and then it ran on the target platform.

Matching single digit with std::regex_match

According to this reference, I should be able to match a single digit with
std::regex e1 ("\\d");
However, when I run the following test code I get a regex exception.
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::regex r("\\d");
std::string s("9");
if (std::regex_match(s, r)) { std::cout << "matched!" << std::endl; }
}
GCC's std::regex support is not yet ready for prime time. See: Is gcc 4.8 or earlier buggy about regular expressions?
If std::regex support is still buggy as #qwrrty suggests, the character class '[0-9]' is a substitute for '\d'.

STL and Regular Expression

I'm trying to write a string parser that uses the standard library methods in C++. I want to parse out of an incoming string substrings that end with a newline or a ';'. I keep getting exceptions from the regex object that I create. My pattern is:
string pattern = "(.+[\\n\\r;])";
regex cmd_sep(pattern);
I've tried it with and without the regex_constants::extended or basic flags.
You'd better post your error message, if you are using boost library. It is possible you've missed boost::regex tag.
Try this
#include <boost/regex.hpp>
#include <string>
using namespace std;
int main ()
{
string pattern = "(.+[\\n\\r;])";
static const boost::regex cmd_sep(pattern);
return 0;
}

How do I remove meta tags from a string using C++?

Need help figuring out how to extract text from context (Honda from str), need something analogous to Perl regex
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
string str;
str = "<make>Honda</make>";
//Code to extract Honda from above string
cout<<str<<endl;
cin.get();
return 0;
}
need something analogous to Perl regex
Is this a trick question? :) That "something" is PCRE: "Perl-Compatible Regular Expressions".
What you really need is libxml2, and the XPath query //meta/text().
In C# (I don't know programming in C#), I know there is Regex but in C++ it may be included in external libraries