c++ std regex question mark issue - c++

I'm having troubles with std regex. I can't make the question mark quantifier work. The call to regex_match will always return 0.
I also tried with {0,1} which doesn't behave like I expected either: it behaves like a + quantifier.
Here is my code :
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char **argv){
regex e1("ab?c");
cout << regex_match("ac", e1) << endl; // expected : 1, output 0
cout << regex_match("abc", e1) << endl; // expected : 1, output 0
cout << regex_match("abbc", e1) << endl; // expected : 0, output 0
regex e2("ab{0,1}c");
cout << regex_match("ac", e2) << endl; // expected : 1, output 0
cout << regex_match("abc", e2) << endl; // expected : 1, output 1
cout << regex_match("abbc", e2) << endl; // expected : 0, output 1
return 0;
}
I used the following command to compile:
g++ -std=c++11 main.cpp -o regex_test
Am i doing something wrong here? Or why isn't it working?

Your regular expression code is fine. The implementation that you're using is not. It provides a header that declares a bunch of things that the library implements badly. If a commercial package did that it would be roundly criticized, and rightly so. You get what you pay for.

str::regex is mostly not implemented in gcc (at the time of writing). See section 28 at:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011

The POSIX standard for regular expressions defines two type, basic and extended. The ? operator is an extended feature. Apparently you use
std::regex re2(".*(a|xayy)", std::regex::extended)
to get the extended features.

Related

How do I create a case insensitive regex to match file extensions?

I am trying to match all files that have the extension .nef - The match must be case insensitive.
regex e("(.*)(\\.NEF)",ECMAScript|icase);
...
if (regex_match ( fn1, e )){
//Do Something
}
here fn1 is a string with a file name.
However, this "does something" only with files with .NEF (upper case) extensions. .nef extensions are ignored.
I also tried -
regex e("(.*)(\\.[Nn][Ee][Ff])");
and
regex e("(.*)(\\.[N|n][E|e][F|f])");
both of which resulted in a runtime error.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)
My code is compiled using -
g++ nefread.cpp -o nefread -lraw_r -lpthread -pthread -std=c++11 -O3
What am I doing wrong? This is my basic code. I want to extend it to match more file extensions .nef, .raw, .cr2 etc.
Your original expression is correct, and should produce the desired result. The problem is with the gcc implementation of <regex>, which is broken. This answer explains the historical reasons why it so, and also says that gcc4.9 will ship with a working <regex> implementation.
Your code works using Boost.Regex
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
// Simple regular expression matching
boost::regex expr(R"((.*)\.(nef))", boost::regex_constants::ECMAScript |
boost::regex_constants::icase);
// ^^^ ^^
// no need escape the '\' if you use raw string literals
boost::cmatch m;
for (auto const& fname : {"foo.nef", "bar.NeF", "baz.NEF"}) {
if(boost::regex_match(fname, m, expr)) {
std::cout << "matched: " << m[0] << '\n';
std::cout << " " << m[1] << '\n';
std::cout << " " << m[2] << '\n';
}
}
}
Live demo

C++ regex_match behaviour

I am using C++ regex. was not able to grasp the following programming output.
#include <iostream>
#include <regex>
#include <algorithm>
#include <string>
using namespace std;
int main(){
regex r("a(b+)(c+)d");
string s ="abcd";
smatch m;
cout << s << endl;
const bool b = regex_match(s,m, r);
cout << b <<endl; // prints 1 - OK
if(b){
cout << m[0] << endl; // prints abcd - OK
cout << m[1] << endl; // prints ab - Why? Should it be just b?
cout<< m[2] << endl; // prints bc - Why? Should it be just c?
}
}
I per my exposure to regex in other languages, the parenthesis should match the captured part of the string? so the output should be
1
abcd
b
c
EDIT:
I am using g++ 4.6
Assuming you are using g++, you should note that its implementation of <regex> (section 28) is incomplete. Note the listings for basic_regex, sub_match, and match_results are declared "Partial".
For more info on g++, I think this post from a year ago is still relevant (as is this bug report).
This would explain why it's not giving the results that you expect. You may wish to try Boost regex in the meantime.

boost::filesystem adding quotation marks?

When using boost_filesystem, Boost keeps adding quotation marks to the filenames.
foo.cpp:
#include <iostream>
#include <boost/filesystem.hpp>
int main( int argc, char * argv[] )
{
std::cout << argv[0] << std::endl;
boost::filesystem::path p( argv[0] );
std::cout << p << std::endl;
std::cout << p.filename() << std::endl;
return 0;
}
Compiled:
g++ foo.cpp -o foo -lboost_filesystem -lboost_system
Output:
./foo
"./foo"
"foo"
This is somewhat unexpected, and inconvenient in my case. Is this really intentional, or is my somewhat older version of Boost (1.46.1) buggy in this respect? Is there some way I could avoid them being added?
I perused the documentation, but aside from the tutorials not showing those quotation marks in their example output, I was not enlightened.
This is actually a bug filed on the Boost framework on version 1.47.0.
The proposed workaround is:
std::cout << path("/foo/bar.txt").filename().string()
It's intentional because unexpected embedded spaces and confuse related code. The best you can do is probably:
boost::replace_all(yourquotedstring, "\"", "");
EDIT
Although, according to this link, you can try something like:
std::cout << path("/foo/bar.txt").filename().string();

Why is this non-open ifstream still "good" after I try to extract from it?

In GCC 4.7.0 20111217, GCC 4.1.2, GCC 4.3.4 and GCC 4.5.1:
#include <iostream>
#include <fstream>
int main() {
std::ifstream f;
std::cout << f.get() << ", " << f.good() << ", " << f.bad();
}
// Output: -1, 1, 0
I'd have expected -1, 0, 1 (which Clang 3.1 gives me), because of these paragraphs:
[C++11: 27.7.2.3]:
int_type get();
4 Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, extracts a character c, if one is available. Otherwise, the function calls setstate(failbit), which may throw ios_base::failure (27.5.5.4),
5 Returns: c if available, otherwise traits::eof().
and:
[C++11: 27.9.1.1/3]: In particular:
If the file is not open for reading the input sequence cannot be read.
If the file is not open for writing the output sequence cannot be written.
A joint file position is maintained for both the input sequence and the output sequence.
Is GCC in the wrong here? Did an implementer misinterpret the "otherwise" in [C++11: 27.7.2.3/4]? Or did I misinterpret the standard?
The problem is that the order that the operations on f is called is not specified by the standard. The compiler is free to call f.good(), f.bad(), and f.get() in any order it chooses.
Try changing to print it on separate lines.
I've just realised I'm being silly:
#include <iostream>
#include <fstream>
int main() {
std::ifstream f;
std::cout << f.get() << ", ";
std::cout << f.good() << ", " << f.bad();
}
// Output: -1, 0, 0
Unspecified evaluation order. Duh.

Regular Expressions misunderstanding or just broken implementation?

I tried a very simple use of regex_search and can not understand why I do not get a match:
Alas, the gcc-C++0x-implementations 4.5 does not seem to be working, I get a link error here.
But here is my gcc-4.7.0 try, quite straightforward:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main () {
regex rxWorld("world");
const string text = "hello world!";
const auto t0 = text.cbegin();
smatch match;
const bool ok = regex_search(text, match, rxWorld);
/* ... */
}
I think I should get ok==true and something in match as well. I reduced the example to a very simple regex for this. I tried slightly more complicated first.
But by printing code at /* ... */ says otherwise:
cout << " text:'" << text
<< "' ok:" << ok
<< " size:" << match.size();
cout << " pos:" << match.position()
<< " len:"<< match.length();
for(const auto& sub : match) {
cout << " ["<<(sub.first-t0)<<".."<<(sub.second-t0)
<< ":"<<sub.matched
<< "'"<<sub.str()
<< "']";
}
cout << endl;
The output is:
$ ./regex-search-01.x
text:'hello world!' ok:0 size:0 pos:-1 len:0
Update: I also tried regex_search(t0, text.cend(), match, rxWorld) and const char* text, no change.
`
Is my understanding of regex_search wrong? I am completely baffled. Or is it just the gcc?
As you can see from the C++-0x status of libstdc++ the regex support is incomplete.
In particular match_results are not finished. Iterators are not even started.
Volunteers are welcome ;-)
[EDIT] [As of gcc-4.9]2 will be fully supported.