C++ regex_match behaviour - c++

I am using C++ regex. was not able to grasp the following programming output.
#include <iostream>
#include <regex>
#include <algorithm>
#include <string>
using namespace std;
int main(){
regex r("a(b+)(c+)d");
string s ="abcd";
smatch m;
cout << s << endl;
const bool b = regex_match(s,m, r);
cout << b <<endl; // prints 1 - OK
if(b){
cout << m[0] << endl; // prints abcd - OK
cout << m[1] << endl; // prints ab - Why? Should it be just b?
cout<< m[2] << endl; // prints bc - Why? Should it be just c?
}
}
I per my exposure to regex in other languages, the parenthesis should match the captured part of the string? so the output should be
1
abcd
b
c
EDIT:
I am using g++ 4.6

Assuming you are using g++, you should note that its implementation of <regex> (section 28) is incomplete. Note the listings for basic_regex, sub_match, and match_results are declared "Partial".
For more info on g++, I think this post from a year ago is still relevant (as is this bug report).
This would explain why it's not giving the results that you expect. You may wish to try Boost regex in the meantime.

Related

std::regex fails to match char

I'm trying to get a regex to match a char containing a space ' '.
When compiled with g++ (MinGW 8.1.0 on Windows) it reliably fails to match.
When compiled with onlinegdb it reliably matches
Why would the behaviour differ between these two? What would be the best way to get my regex to match properly without using a std::string (which does match correctly)
My code:
#include <iostream>
#include <regex>
#include <string>
int main() {
char a = ' ';
std::string b = " ";
cout << std::regex_match(b, std::regex("\\s+")) << \n; // always writes 1
cout << std::regex_match(&a, std::regex("\\s+")) << \n; // writes 1 in onlinegdb, 0 with MinGW
}

string move assignment exchange of values

I was programming some test cases an noticed an odd behaviour.
An move assignment to a string did not erase the value of the first string, but assigned the value of the target string.
sample code:
#include <utility>
#include <string>
#include <iostream>
int main(void) {
std::string a = "foo";
std::string b = "bar";
std::cout << a << std::endl;
b = std::move(a);
std::cout << a << std::endl;
return 0;
}
result:
$ ./string.exe
foo
bar
expected result:
$ ./string.exe
foo
So to my questions:
Is that intentional?
Does this happen only with strings and/or STL objects?
Does this happen with custom objects (as in user defined)?
Environment:
Win10 64bit
msys2
g++ 5.2
EDIT
After reading the possible duplicate answer and the answer by #OMGtechy
i extended the test to check for small string optimizations.
#include <utility>
#include <string>
#include <iostream>
#include <cinttypes>
#include <sstream>
int main(void) {
std::ostringstream oss1;
oss1 << "foo ";
std::ostringstream oss2;
oss2 << "bar ";
for (std::uint64_t i(0);;++i) {
oss1 << i % 10;
oss2 << i % 10;
std::string a = oss1.str();
std::string b = oss2.str();
b = std::move(a);
if (a.size() < i) {
std::cout << "move operation origin was cleared at: " << i << std::endl;
break;
}
if (0 == i % 1000)
std::cout << i << std::endl;
}
return 0;
}
This ran on my machine up to 1 MB, which is not a small string anymore.
And it just stopped, so i could paste the source here (Read: i stopped it).
This is likely due to short string optimization; i.e. there's no internal pointer to "move" over, so it ends up acting just like a copy.
I suggest you try this with a string large number of characters; this should be enough to get around short string optimization and exhibit the behaviour you expected.
This is perfectly valid, because the C++ standard states that moved from objects (with some exceptions, strings are not one of them as of C++11) shall be in a valid but unspecified state.

How do I create a case insensitive regex to match file extensions?

I am trying to match all files that have the extension .nef - The match must be case insensitive.
regex e("(.*)(\\.NEF)",ECMAScript|icase);
...
if (regex_match ( fn1, e )){
//Do Something
}
here fn1 is a string with a file name.
However, this "does something" only with files with .NEF (upper case) extensions. .nef extensions are ignored.
I also tried -
regex e("(.*)(\\.[Nn][Ee][Ff])");
and
regex e("(.*)(\\.[N|n][E|e][F|f])");
both of which resulted in a runtime error.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)
My code is compiled using -
g++ nefread.cpp -o nefread -lraw_r -lpthread -pthread -std=c++11 -O3
What am I doing wrong? This is my basic code. I want to extend it to match more file extensions .nef, .raw, .cr2 etc.
Your original expression is correct, and should produce the desired result. The problem is with the gcc implementation of <regex>, which is broken. This answer explains the historical reasons why it so, and also says that gcc4.9 will ship with a working <regex> implementation.
Your code works using Boost.Regex
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
// Simple regular expression matching
boost::regex expr(R"((.*)\.(nef))", boost::regex_constants::ECMAScript |
boost::regex_constants::icase);
// ^^^ ^^
// no need escape the '\' if you use raw string literals
boost::cmatch m;
for (auto const& fname : {"foo.nef", "bar.NeF", "baz.NEF"}) {
if(boost::regex_match(fname, m, expr)) {
std::cout << "matched: " << m[0] << '\n';
std::cout << " " << m[1] << '\n';
std::cout << " " << m[2] << '\n';
}
}
}
Live demo

c++ std regex question mark issue

I'm having troubles with std regex. I can't make the question mark quantifier work. The call to regex_match will always return 0.
I also tried with {0,1} which doesn't behave like I expected either: it behaves like a + quantifier.
Here is my code :
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char **argv){
regex e1("ab?c");
cout << regex_match("ac", e1) << endl; // expected : 1, output 0
cout << regex_match("abc", e1) << endl; // expected : 1, output 0
cout << regex_match("abbc", e1) << endl; // expected : 0, output 0
regex e2("ab{0,1}c");
cout << regex_match("ac", e2) << endl; // expected : 1, output 0
cout << regex_match("abc", e2) << endl; // expected : 1, output 1
cout << regex_match("abbc", e2) << endl; // expected : 0, output 1
return 0;
}
I used the following command to compile:
g++ -std=c++11 main.cpp -o regex_test
Am i doing something wrong here? Or why isn't it working?
Your regular expression code is fine. The implementation that you're using is not. It provides a header that declares a bunch of things that the library implements badly. If a commercial package did that it would be roundly criticized, and rightly so. You get what you pay for.
str::regex is mostly not implemented in gcc (at the time of writing). See section 28 at:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011
The POSIX standard for regular expressions defines two type, basic and extended. The ? operator is an extended feature. Apparently you use
std::regex re2(".*(a|xayy)", std::regex::extended)
to get the extended features.

Regular Expressions misunderstanding or just broken implementation?

I tried a very simple use of regex_search and can not understand why I do not get a match:
Alas, the gcc-C++0x-implementations 4.5 does not seem to be working, I get a link error here.
But here is my gcc-4.7.0 try, quite straightforward:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main () {
regex rxWorld("world");
const string text = "hello world!";
const auto t0 = text.cbegin();
smatch match;
const bool ok = regex_search(text, match, rxWorld);
/* ... */
}
I think I should get ok==true and something in match as well. I reduced the example to a very simple regex for this. I tried slightly more complicated first.
But by printing code at /* ... */ says otherwise:
cout << " text:'" << text
<< "' ok:" << ok
<< " size:" << match.size();
cout << " pos:" << match.position()
<< " len:"<< match.length();
for(const auto& sub : match) {
cout << " ["<<(sub.first-t0)<<".."<<(sub.second-t0)
<< ":"<<sub.matched
<< "'"<<sub.str()
<< "']";
}
cout << endl;
The output is:
$ ./regex-search-01.x
text:'hello world!' ok:0 size:0 pos:-1 len:0
Update: I also tried regex_search(t0, text.cend(), match, rxWorld) and const char* text, no change.
`
Is my understanding of regex_search wrong? I am completely baffled. Or is it just the gcc?
As you can see from the C++-0x status of libstdc++ the regex support is incomplete.
In particular match_results are not finished. Iterators are not even started.
Volunteers are welcome ;-)
[EDIT] [As of gcc-4.9]2 will be fully supported.