Matching single digit with std::regex_match - c++

According to this reference, I should be able to match a single digit with
std::regex e1 ("\\d");
However, when I run the following test code I get a regex exception.
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::regex r("\\d");
std::string s("9");
if (std::regex_match(s, r)) { std::cout << "matched!" << std::endl; }
}

GCC's std::regex support is not yet ready for prime time. See: Is gcc 4.8 or earlier buggy about regular expressions?

If std::regex support is still buggy as #qwrrty suggests, the character class '[0-9]' is a substitute for '\d'.

Related

C++11 RegEx, non-greedy

I have a little bit of a problem with a C++11 RegEx and I think it is about greedynes.
Here is a little sample.
#include <stdio.h>
#include <string>
#include <regex>
int main (void)
{
std::string in="{ab}{cd}[ef]{gh}[ij][kl]"; // the input-string
std::regex rx1 ("(\\{.+?})(.*)", std::regex::extended); // non-greedy?
std::smatch match;
if (regex_match (in, match, rx1))
{
printf ("\n%s\n", match.str(1).c_str());
}
return 0;
}
I would expect
{ab}
for output.
But I got
{ab}{cd}[ef]{gh}
I would expect the result I get, if I do it greedy but not with the ? after the .+.
Should make it non-greedy, right?
So whats the problem in my idea?
Thanks for help!
Chris
You need to remove the std::regex::extended, it makes your regex POSIX ERE compliant, and that regex flavor does not support lazy quantifiers.
std::regex rx1("(\\{.+?})(.*)");
See the C++ demo

Differences in regex support between gcc 4.9.2 and gcc 5.3

Can anyone more familiar with gcc point out why the sample below fails to match on gcc 4.9.2 but succeeds on gcc 5.3? Is there anything I can do to alternate the pattern so that it would work (also seems to work fine on VS 2013)?
#include <iostream>
#include <regex>
std::regex pattern("HTTP/(\\d\\.\\d)\\s(\\d{3})\\s(.*)\\r\\n(([!#\\$%&\\*\\+\\-\\./a-zA-Z\\^_`\\|-]+\\:[^\\r]+\\r\\n)*)\\r\\n");
const char* test = "HTTP/1.1 200 OK\r\nHost: 192.168.1.72:8080\r\nContent-Length: 86\r\n\r\n";
int main()
{
std::cmatch results;
bool matched = std::regex_search(test, test + strlen(test), results, pattern);
std::cout << matched;
return 0;
}
I assume I am using something that is not supported in gcc 4.9.2 but was added on or fixed later, but I have no idea where to look it up.
UPDATE
Due to the amount of help and suggestions I tried to backtrack the issue instead of just switching to gcc 5. I get correct matches with this modification:
#include <iostream>
#include <regex>
std::regex pattern("HTTP/(\\d\\.\\d)\\s(\\d{3})\\s(.*?)\\r\\n(?:([^:]+\\:[^\\r]+\\r\\n)*)\\r\\n");
const char* test = "HTTP/1.1 200 OK\r\nHost: 192.168.1.72:8080\r\nContent-Length: 86\r\n\r\n";
int main()
{
std::cmatch results;
bool matched = std::regex_search(test, test + strlen(test), results, pattern);
std::cout << matched << std::endl;
if (matched)
{
for (const auto& result : results)
{
std::cout << "matched: " << result.str() << std::endl;
}
}
return 0;
}
So I guess the problem is with the group that matches the HTTP header name. Will check further.
UPDATE 2
std::regex pattern(R"(HTTP/(\d\.\d)\s(\d{3})\s(.*?)\r\n(?:([!#$&a-zA-Z^_`|-]+\:[^\r]+\r\n)*)\r\n)")
is the last thing that works. Adding any of the remaining characters that I had in my group - %*+-. (escaped or not epscaped) - breaks it.
So I know GCC did not support the c++11 regex library until GCC 4.9 officially. See Is gcc 4.8 or earlier buggy about regular expressions?. Since it was so new, it is likely that it had a few bugs to smooth out. Pinning down the exact cause would be difficult, but the problem is in the implementation and not in the regex.
Side note: I remember spending 20 minutes one time trying to figure out what was wrong with my regex when I found the mentioned article and realized that I was using gcc 4.8.*. Since the machine I had to run on wasn't mine, I basically ended up compiling on a different, similar platform with a later version of gcc and a few hacks and then it ran on the target platform.

Why does the c++ regex_match function require the search string to be defined outside of the function?

I am using Visual Studio 2013 for development, which uses v12 of Microsoft's c++ compiler tools.
The following code executes fine, printing "foo" to the console:
#include <regex>
#include <iostream>
#include <string>
std::string get() {
return std::string("foo bar");
}
int main() {
std::smatch matches;
std::string s = get();
std::regex_match(s, matches, std::regex("^(foo).*"));
std::cout << matches[1] << std::endl;
}
// Works as expected.
The same code, with the string "s" substituted for the "get()" function, throws a "string iterators incompatible" error at runtime:
#include <regex>
#include <iostream>
#include <string>
std::string get() {
return std::string("foo bar");
}
int main() {
std::smatch matches;
std::regex_match(get(), matches, std::regex("^(foo).*"));
std::cout << matches[1] << std::endl;
}
// Debug Assertion Failed!
// Expression: string iterators incompatible
This makes no sense to me. Can anyone explain why this happens?
The reason is that get() returns a temporary string, so the match results contains iterators into an object that no longer exists, and trying to use them is undefined behaviour. The debugging assertions in the Visual Studio C++ library notice this problem and abort your program.
Originally C++11 did allow what you're trying to do, but because it is so dangerous it was prevented by adding a deleted overload of std::regex_match which gets used when trying to get match results from a temporary string, see LWG DR 2329. That means your program should not compile in C++14 (or in compilers that implement the DR in C++11 mode too). GCC does not yet implement the change yet, I'll fix that.

c++11 regex back references

I don't manage to use back references in regular expression in c++.
After trying more esoteric things, I tried this simple script on gcc 4.8.1:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
regex e("(..)\\1");
string s("aaaa");
if (regex_match(s,e))
cout << "match" << endl;
return 0;
}
but it produces a runtime error. I tried various flags in regex_constants like ECMAScript or grep but to no avail. What's wrong with this way of using back references in C++ regex engine?
Just to make sure I was not missing something trivial, I tried this in Java
class TestIt
{
public static void main (String[] args) throws java.lang.Exception
{
final String s = "aaaa";
final String e = "(..)\\1";
if (s.matches(e))
System.out.printf("match");
}
};
and obviously it prints match as expected, which is reassuring.
The regex engine included in gcc (in libstdc++) is not fully working yet. This regex works as expected on clang. So this issue has nothing to do with the way C++ treats regular expressions; rather it depends on the compiler used.

C++ regex string capture

Tring to get C++ regex string capture to work. I have tried all four combinations of Windows vs. Linux, Boost vs. native C++ 0x11. The sample code is:
#include <string>
#include <iostream>
#include <boost/regex.hpp>
//#include <regex>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
smatch sm1;
regex_search(string("abhelloworld.jpg"), sm1, regex("(.*)jpg"));
cout << sm1[1] << endl;
smatch sm2;
regex_search(string("hell.g"), sm2, regex("(.*)g"));
cout << sm2[1] << endl;
}
The closest that works is g++ (4.7) with Boost (1.51.0). There, the first cout outputs the expected abhelloworld. but nothing from the second cout.
g++ 4.7 with -std=gnu++11 and <regex> instead of <boost/regex.hpp> produces no output.
Visual Studio 2012 using native <regex> yields an exception regarding incompatible string iterators.
Visual Studio 2008 with Boost 1.51.0 and <boost/regex.hpp> yields an exception regarding "Standard C++ Libraries Invalid argument".
Are these bugs in C++ regex, or am I doing something wrong?
Are these bugs in C++ regex, or am I doing something wrong?
At the time of your posting, gcc didn't support <regex> as noted in the other answer (it does now). As for the other problems, your problem is you are passing temporary string objects. Change your code to the following:
smatch sm1;
string s1("abhelloworld.jpg");
regex_search(s1, sm1, regex("(.*)jpg"));
cout << sm1[1] << endl;
smatch sm2;
string s2("hell.g");
regex_search(s2, sm2, regex("(.*)g"));
cout << sm2[1] << endl;
Your original example compiles because regex_search takes a const reference which temporary objects can bind to, however, smatch only stores iterators into your temporary object which no longer exists. The solution is to not pass temporaries.
If you look in the C++ standard at [ยง 28.11.3/5], you will find the following:
Returns: The result of regex_search(s.begin(), s.end(), m, e, flags).
What this means is that internally, only iterators to your passed in string are used, so if you pass in a temporary, iterators to that temporary object will be used which are invalid and the actual temporary itself is not stored.
GCC doesn't support <regex> yet. Refer to the Manual