Is this C++11 regex error me or the compiler? - c++

OK, this isn't the original program I had this problem in, but I duplicated it in a much smaller one. Very simple problem.
main.cpp:
#include <iostream>
#include <regex>
using namespace std;
int main()
{
regex r1("S");
printf("S works.\n");
regex r2(".");
printf(". works.\n");
regex r3(".+");
printf(".+ works.\n");
regex r4("[0-9]");
printf("[0-9] works.\n");
return 0;
}
Compiled successfully with this command, no error messages:
$ g++ -std=c++0x main.cpp
The last line of g++ -v, by the way, is:
gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)
And the result when I try to run it:
$ ./a.out
S works.
. works.
.+ works.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted
It happens the same way if I change r4 to \\s, \\w, or [a-z]. Is this a problem with the compiler? I might be able to believe that C++11's regex engine has different ways of saying "whitespace" or "word character," but square brackets not working is a stretch. Is it something that's been fixed in 4.6.2?
EDIT:
Joachim Pileborg has supplied a partial solution, using an extra regex_constants parameter to enable a syntax that supports square brackets, but neither basic, extended, awk, nor ECMAScript seem to support backslash-escaped terms like \\s, \\w, or \\t.
EDIT 2:
Using raw strings (R"(\w)" instead of "\\w") doesn't seem to work either.

Update: <regex> is now implemented and released in GCC 4.9.0
Old answer:
ECMAScript syntax accepts [0-9], \s, \w, etc, see ECMA-262 (15.10). Here's an example with boost::regex that also uses the ECMAScript syntax by default:
#include <boost/regex.hpp>
int main(int argc, char* argv[]) {
using namespace boost;
regex e("[0-9]");
return argc > 1 ? !regex_match(argv[1], e) : 2;
}
It works:
$ g++ -std=c++0x *.cc -lboost_regex && ./a.out 1
According to the C++11 standard (28.8.2) basic_regex() uses regex_constants::ECMAScript flag by default so it must understand this syntax.
Is this C++11 regex error me or the compiler?
gcc-4.6.1 doesn't support c++11 regular expressions (28.13).

The error is because creating a regex by default uses ECMAScript syntax for the expression, which doesn't support brackets. You should declare the expression with the basic or extended flag:
std::regex r4("[0-9]", std::regex_constants::basic);
Edit Seems like libstdc++ (part of GCC, and the library that handles all C++ stuff) doesn't fully implement regular expressions yet. In their status document they say that Modified ECMAScript regular expression grammar is not implemented yet.

Regex support improved between gcc 4.8.2 and 4.9.2. For example, the regex =[A-Z]{3} was failing for me with:
Regex error
After upgrading to gcc 4.9.2, it works as expected.

Related

std::regex issue with gcc 5.1

I have the following regex expression:
(.*)[[:space:]]+(.+)[[:space:]]+(Error|Information|Trace)([[:space:]]+[1234567890]+)?([[:space:]]+[ndmptl]{1,6})?
When I try to initialize an std::regex variable with that expression, it intermittently crashes.
Here is some test code:
std::string matchRegex ("(.*)[[:space:]]+(.+)[[:space:]]+(Error|Information|Trace)([[:space:]]+[1234567890]+)?([[:space:]]+[ndmptl]{1,6})?");
std::regex rm (matchRegex); // intermittently crashes on this line
I am using gcc 5.1 on Windows 7 64-bit. In particular, I am using the TDM gcc 64 MinGW flavor of gcc.
The crash only happens when I generate 32-bit code (via the gcc flag -m32).
64-bit code always works.
Is there a bug in the std::regex implementation that might be causing this?
I checked my regex with several online regex checkers and it passed 100% of the time.

Why do different GCC 4.9.2 installations give different results for this regex match?

I posted the following code on ideone and Coliru:
#include <iostream>
#include <regex>
#include <string>
int main()
{
std::string example{" <match1> <match2> <match3>"};
std::regex re{"<([^>]+)>"};
std::regex_token_iterator<std::string::iterator> it{example.begin(), example.end(), re, 1};
decltype(it) end{};
while (it != end) std::cout << *it++ << std::endl;
return 0;
}
Both sites use GCC 4.9.2. I don't know what compilation arguments ideone uses, but there is nothing unusual in Coliru's.
Coliru doesn't give me the match1 result:
Coliru
# g++ -v 2>&1 | grep version; \
# g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
gcc version 4.9.2 (GCC)
match2
match3
ideone (and, incidentally, Coliru's clang 3.5.0 using libc++)
match1
match2
match3
Does my code have undefined behaviour or something? What could cause this?
It's a bug in libstdc++'s regex_token_iterator copy constructor, as called by the postincrement operator. The bug was fixed in December 2014; versions of gcc 4.9 and 5.x released since then will have the fix. The nature of the bug is that the copy of the iterator aliases the target of the copy, leading to the observed behavior.
The workaround is to use preincrement - this is desirable from a microoptimisation point of view as well, as regex_token_iterator is a reasonably heavy class:
for (; it != end; ++it) std::cout << *it << std::endl;
The code is valid.
The only plausible explanation is that the standard library versions differ; although for the most part standard library implementations are shipped with compilers, they can be upgraded independently through, say, a Linux package manager.
In this instance it seems that this is a libstdc++ fault that was fixed late last year:
Coliru has __GLIBCXX__ == 20141030
ideone has __GLIBCXX__ == 20141220
The most likely match on Bugzilla that I can find is bug 63497 but, to be honest, I'm not convinced this particular bug was ever fully covered by Bugzilla. Joseph Mansfield identified that these specific symptoms in this specific case are triggered by the post-fix increment, at least.

What part of regex is supported by GCC 4.9?

I don't get this. GCC is supposed to support but accoriding to their
http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1
Status page "7 Regular Expressions are not supported".
But then at "28 Regular expressions" - they are checked as supported
http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011
Could you please explain what is actually the standard and what is not?
GCC 4.9 does indeed support the C++11 <regex> functionality but not the tr1 version. Note that the difference is that parts (all?) of the latter exist within a tr1:: namespace while the C++11 <regex> is within namespace std. There's not much point to going backwards and adding in tr1 support when C++11 has been published for some time now.
Following information can be found from GCC 4.9 release notes:
"Support for various C++14 additions have been added to the C++ Front End, on the standard C++ library side the most important addition is support for the C++11 regex"
If you want to install the latest GCC4.9 version to try by yourself you can follow below SO link:
How do I compile and run GCC 4.9.x?
Here is the sample program which has compiled using gcc4.9 and subsequent run.
//Sample Program
#include <regex>
#include <iostream>
using namespace std;
int main() {
regex reg("[0-9]+");
if (regex_match("123000", reg)) {
cout << "It's a match!" <<endl;
}
return 0;
}
$g++ -std=c++11 foo.cpp -o foo
$ g++ -v
Using built-in specs.
COLLECT_GCC=/home/mantosh/gcc-4.9.0/bin/g++
COLLECT_LTO_WRAPPER=/home/mantosh/gcc-4.9.0/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /home/mantosh/objdir/../gcc-4.9.0/configure --disable-multilib --prefix=/home/mantosh/gcc-4.9.0
Thread model: posix
gcc version 4.9.0 (GCC)
$ ./foo
It's a match!

How to use <regex> in C++11

I'm trying to use a regular expression to accept strings that have sequences like
ifelseifelseififif
So every else needs an if, but not every if needs an else. I know I could do this with pen and paper with a simple regular expression like this ((if)*+(ifelse)*)* .Now I'm not sure if I'll be able to do this with the library as I've never used it before. So would it be possible to accept or reject a string based on a regular expression like the one I wrote above?
I wrote this sample to get my feet wet and I don't understand why it returns false. Isn't regex_search() supposed to find substring matches? That snippet prints nope every time.
#include <regex>
#include <iostream>
using namespace std;
int main(){
string sequence="ifelse";
regex rx("if");
if(regex_search(sequence.begin(),sequence.end(),rx)){
cout<<"match found"<<endl;
}else{
cout<<"nope"<<endl;
}
}
I'm using g++ 4.7 and have tried compiling with both g++ -std=gnu+11 reg.cpp and g++ -std=c++11 reg.cpp
If you are compiling with g++ it may be because regex is not fully supported yet. See here for current C++11 status in g++.
This prints "match found", I just ran it. It wouldn't compile if you weren't using c++11 but heres how I compiled it.
clang++ -std=c++11 -stdlib=libc++ reg.cpp

C++ TR1 Regular Expressions Not Available

I'm trying to utilize the 'TR1' regular expression extensions for some C++ string parsing.
I've read that the <regex> header and namespace std::tr1 are required for this
I can compile with the <regex> header present(though it forces me to use either the flag, -std=c++0x or -std=gnu++0x)
However, when I attempt to use the std::tr1 namespace in my program, compiling fails with message that tr1 "is not a namespace name". I cant do things like,
std::tr1::regex rx("mypattern");
I've read that TR1 regular expressions have been supported since gcc 4.3.0. I'm using g++ through gcc 4.4.5.
Am I missing something?
g++ 4.7 doesn't implement regular expressions yet.
But despite that fact, in C++11 regex has been moved from the namespace std::tr1 to std. So, instead of std::tr1::regex, you should write std::regex:
std::regex rx("mypattern");
I don't know for which g++ versions before 4.7 this applies, too. But this ideone example compiles fine with g++ 4.7. However, remember that the regex implementation isn't implemented in this compiler version.