c++11 regex back references - regex

I don't manage to use back references in regular expression in c++.
After trying more esoteric things, I tried this simple script on gcc 4.8.1:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
regex e("(..)\\1");
string s("aaaa");
if (regex_match(s,e))
cout << "match" << endl;
return 0;
}
but it produces a runtime error. I tried various flags in regex_constants like ECMAScript or grep but to no avail. What's wrong with this way of using back references in C++ regex engine?
Just to make sure I was not missing something trivial, I tried this in Java
class TestIt
{
public static void main (String[] args) throws java.lang.Exception
{
final String s = "aaaa";
final String e = "(..)\\1";
if (s.matches(e))
System.out.printf("match");
}
};
and obviously it prints match as expected, which is reassuring.

The regex engine included in gcc (in libstdc++) is not fully working yet. This regex works as expected on clang. So this issue has nothing to do with the way C++ treats regular expressions; rather it depends on the compiler used.

Related

C++ std::regex not support look ahead backreference

I want to match a rule, some name occur like ABA, And A must not equal B.
for example,
'allen_bob_allen' is valid.
'allen_allen_allen' is not valid.
at https://regexr.com/, I wrote the regex pattern
([a-z]+)_(?!\1)([a-z]+)_\1 and it work perfectly,
But When I write this in C++, use std::regex,
using namespace std;
#include <regex>
int main() {
std::regex regex_test2("([a-z]+)_(?<!\\1)([a-z]+)_\\1");
return 0;
}
I got the following error:
libc++abi.dylib: terminating with uncaught exception of type std::__1::regex_error: The expression contained an invalid back reference.
here is my compiler info:
Is there anyway to fix this?

Simple Regex Usage in C++

I try to do the most basic regex example in C++ using the default lib, and I keep getting either crashes or incoherent behavior.
// with -std=c++11
#include <regex>
using namespace std;
int main()
{
// Copied from the documentation, this one works
if (std::regex_match ("subject", std::regex("(sub)(.*)") ))
std::cout << "string matched\n";
// The most simple example I could try, crash with "regex_error"
if (std::regex_match ("99", std::regex("[0-9]*") ))
std::cout << "integer matched\n";
}
I've tried multiple syntaxes and flags, but nothing seems to work. Since my code seems to be matching all the examples I can find, I'm struggling to see what I'm missing.
As #Wiktor Stribiżew stated, it was just my compiler being too old. Updating the compiler (from gcc 4.1 to gcc 4.9) solved the problem!

Why does the c++ regex_match function require the search string to be defined outside of the function?

I am using Visual Studio 2013 for development, which uses v12 of Microsoft's c++ compiler tools.
The following code executes fine, printing "foo" to the console:
#include <regex>
#include <iostream>
#include <string>
std::string get() {
return std::string("foo bar");
}
int main() {
std::smatch matches;
std::string s = get();
std::regex_match(s, matches, std::regex("^(foo).*"));
std::cout << matches[1] << std::endl;
}
// Works as expected.
The same code, with the string "s" substituted for the "get()" function, throws a "string iterators incompatible" error at runtime:
#include <regex>
#include <iostream>
#include <string>
std::string get() {
return std::string("foo bar");
}
int main() {
std::smatch matches;
std::regex_match(get(), matches, std::regex("^(foo).*"));
std::cout << matches[1] << std::endl;
}
// Debug Assertion Failed!
// Expression: string iterators incompatible
This makes no sense to me. Can anyone explain why this happens?
The reason is that get() returns a temporary string, so the match results contains iterators into an object that no longer exists, and trying to use them is undefined behaviour. The debugging assertions in the Visual Studio C++ library notice this problem and abort your program.
Originally C++11 did allow what you're trying to do, but because it is so dangerous it was prevented by adding a deleted overload of std::regex_match which gets used when trying to get match results from a temporary string, see LWG DR 2329. That means your program should not compile in C++14 (or in compilers that implement the DR in C++11 mode too). GCC does not yet implement the change yet, I'll fix that.

STL and Regular Expression

I'm trying to write a string parser that uses the standard library methods in C++. I want to parse out of an incoming string substrings that end with a newline or a ';'. I keep getting exceptions from the regex object that I create. My pattern is:
string pattern = "(.+[\\n\\r;])";
regex cmd_sep(pattern);
I've tried it with and without the regex_constants::extended or basic flags.
You'd better post your error message, if you are using boost library. It is possible you've missed boost::regex tag.
Try this
#include <boost/regex.hpp>
#include <string>
using namespace std;
int main ()
{
string pattern = "(.+[\\n\\r;])";
static const boost::regex cmd_sep(pattern);
return 0;
}

C++ regex string capture

Tring to get C++ regex string capture to work. I have tried all four combinations of Windows vs. Linux, Boost vs. native C++ 0x11. The sample code is:
#include <string>
#include <iostream>
#include <boost/regex.hpp>
//#include <regex>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
smatch sm1;
regex_search(string("abhelloworld.jpg"), sm1, regex("(.*)jpg"));
cout << sm1[1] << endl;
smatch sm2;
regex_search(string("hell.g"), sm2, regex("(.*)g"));
cout << sm2[1] << endl;
}
The closest that works is g++ (4.7) with Boost (1.51.0). There, the first cout outputs the expected abhelloworld. but nothing from the second cout.
g++ 4.7 with -std=gnu++11 and <regex> instead of <boost/regex.hpp> produces no output.
Visual Studio 2012 using native <regex> yields an exception regarding incompatible string iterators.
Visual Studio 2008 with Boost 1.51.0 and <boost/regex.hpp> yields an exception regarding "Standard C++ Libraries Invalid argument".
Are these bugs in C++ regex, or am I doing something wrong?
Are these bugs in C++ regex, or am I doing something wrong?
At the time of your posting, gcc didn't support <regex> as noted in the other answer (it does now). As for the other problems, your problem is you are passing temporary string objects. Change your code to the following:
smatch sm1;
string s1("abhelloworld.jpg");
regex_search(s1, sm1, regex("(.*)jpg"));
cout << sm1[1] << endl;
smatch sm2;
string s2("hell.g");
regex_search(s2, sm2, regex("(.*)g"));
cout << sm2[1] << endl;
Your original example compiles because regex_search takes a const reference which temporary objects can bind to, however, smatch only stores iterators into your temporary object which no longer exists. The solution is to not pass temporaries.
If you look in the C++ standard at [§ 28.11.3/5], you will find the following:
Returns: The result of regex_search(s.begin(), s.end(), m, e, flags).
What this means is that internally, only iterators to your passed in string are used, so if you pass in a temporary, iterators to that temporary object will be used which are invalid and the actual temporary itself is not stored.
GCC doesn't support <regex> yet. Refer to the Manual