This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 7 years ago.
I'm trying to get my code to work on both OS X and Linux the same.
The code below is compiled with clang++ --std=c++11 regextest.cpp
#include <regex>
#include <iostream>
int main()
{
std::string str = "/api/asd/";
std::string pattern = "/api/(.*)/";
std::cout << "Starting matching" << std::endl;
std::smatch matches;
if (std::regex_match(str, matches, std::regex(pattern, std::regex::egrep)))
{
std::cout << "Found match!" << std::endl;
std::cout << "All matches: ";
for (auto& it : matches)
std::cout << it << ", ";
std::cout << std::endl;
}
return 0;
}
On OS X, the result of running this code is:
Starting matching
Found match!
All matches: /api/asd/, asd,
On Linux, on the other hand (Gentoo, libstdc++ 3.3)
Starting matching
Found match!
All matches: /api/asd/, /asd/, //
How does it match /api/ on Linux? Why?
Additionally, trying to use a pattern like /api/([^/]) fails completely in Linux and matches nothing but works well in OS X.
I've tried many combinations of match types, (basic, extended, grep, egrep, awk) with escaped and unescaped ( and ) (depending on the match type) and nothing produces the expected results on Linux.
As suggested by the comments, this issue was solved by upgrading gcc to 4.9. (~amd64 flag currently required to do this on Gentoo).
Related
I suppose it's something very stupid, however this does not match, and I have no idea why.
I compiles successfully and everything, but it just doesn't match.
I've already used RE(".*") but it doesn't work as well.
The system is OS X (installed pcre using brew).
std::string s;
if (pcrecpp::RE("h.*o").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
You are trying to extract one subpattern (in &s), but have not included any parentheses to capture that subpattern. Try this (untested, note parentheses).
std::string s;
if (pcrecpp::RE("(h.*o)").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
The documentation at http://www.pcre.org/original/doc/html/pcrecpp.html has a similar example, stating:
Example: fails because there aren't enough sub-patterns:
!pcrecpp::RE("\w+:\d+").FullMatch("ruby:1234", &s);
Following the example here I wrote following code:
using namespace std::regex_constants;
std::string str("{trol,asdfsad},{safsa, aaaaa,aaaaadfs}");
std::smatch m;
std::regex r("\\{(.*)\\}"); // matches anything between {}
std::cout << "Initiating search..." << std::endl;
while (std::regex_search(str, m, r)) {
for (auto x : m) {
std::cout << x << " ";
}
std::cout << std::endl;
str = m.suffix().str();
}
But to my surprise, it doesn't find anything at all which I fail to understand. I would understand if the regex matches whole string since .* is greedy but nothing at all? What am I doing wrong here?
To be clear - I know that regexes are not suitable for Parsing BUT I won't deal with more levels of bracket nesting and therefore I find usage of regexes good enough.
If you want to use basic posix syntax, your regex should be
{\\(.*\\)}
If you want to use default ECMAScript, your regex should be
\\{(.*)\\}
with clang and libc++ or with gcc 4.9+ (since only it fully support regex) your code give:
Initiating search...
{trol,asdfsad},{safsa, aaaaa,aaaaadfs} trol,asdfsad},{safsa, aaaaa,aaaaadfs
Live example on coliru
Eventually it turned out to really be problem with gcc version so I finally got it working using boost::regex library and following code:
std::string str("{trol,asdfsad},{safsa,aaaaa,aaaaadfs}");
boost::regex rex("\\{(.*?)\\}", boost::regex_constants::perl);
boost::smatch result;
while (boost::regex_search(str, result, rex)) {
for (uint i = 0; i < result.size(); ++i) {
std::cout << result[i] << " ";
}
std::cout << std::endl;
str = result.suffix().str();
}
I am trying to match all files that have the extension .nef - The match must be case insensitive.
regex e("(.*)(\\.NEF)",ECMAScript|icase);
...
if (regex_match ( fn1, e )){
//Do Something
}
here fn1 is a string with a file name.
However, this "does something" only with files with .NEF (upper case) extensions. .nef extensions are ignored.
I also tried -
regex e("(.*)(\\.[Nn][Ee][Ff])");
and
regex e("(.*)(\\.[N|n][E|e][F|f])");
both of which resulted in a runtime error.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)
My code is compiled using -
g++ nefread.cpp -o nefread -lraw_r -lpthread -pthread -std=c++11 -O3
What am I doing wrong? This is my basic code. I want to extend it to match more file extensions .nef, .raw, .cr2 etc.
Your original expression is correct, and should produce the desired result. The problem is with the gcc implementation of <regex>, which is broken. This answer explains the historical reasons why it so, and also says that gcc4.9 will ship with a working <regex> implementation.
Your code works using Boost.Regex
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
// Simple regular expression matching
boost::regex expr(R"((.*)\.(nef))", boost::regex_constants::ECMAScript |
boost::regex_constants::icase);
// ^^^ ^^
// no need escape the '\' if you use raw string literals
boost::cmatch m;
for (auto const& fname : {"foo.nef", "bar.NeF", "baz.NEF"}) {
if(boost::regex_match(fname, m, expr)) {
std::cout << "matched: " << m[0] << '\n';
std::cout << " " << m[1] << '\n';
std::cout << " " << m[2] << '\n';
}
}
}
Live demo
This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 9 years ago.
I want to parse a token that looks like this:
1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=
I use a regular expression ([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}), and it does the job when I try it on refiddle.
Then I try it with C++:
std::regex regexp(R"(([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}))",
std::regex_constants::basic);
std::smatch match;
if (std::regex_search(stringified, match, regexp)) {
cout << match[0] << ',' << match[1] << ',' << match[2] << endl;
} else {
cout << "No matches found" << endl;
}
I compile it on Ubuntu 13.10 x64 using GCC 4.8.1 with -std=c++11 flag. But I always get No matches found. What am I doing wrong?
You were specifying POSIX basic regex, in that format you must escape () and {}
I was able to get get matches with a few changes:
int main(int argc, const char * argv[]){
using std::cout;
using std::endl;
std::regex regexp(R"(\([0-9]\{16\}\):\([0-9]\{5,20\}\):\([a-zA-Z0-9\\+/=]\{28\}\))",std::regex_constants::basic);
std::smatch match;
std::string stringified = "1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=";
if (std::regex_search(stringified, match, regexp)) {
cout << match[1] << "," << match[2] << "," << match[3]<< endl;
} else {
cout << "No matches found" << endl;
}
return 0;
}
Or you could use:
std::regex_constants::extended
If you use std::regex_constants::extended you should not escape () and {}
If you don't want to use a raw string, you can do that as well:
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})",std::regex_constants::extended);
You'll just have to double up on the \\ to properly escape them. The above regex also works with the default regex grammar std::regex_constants::ECMAScript
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})");
It looks like GCC just added regex supported in their development branch of GCC 4.9.
It appears that you need to use 'extended' syntax. Change regex_constants::basic to regex_constants::extended and it will match.
You need extended syntax in order to perform capturing.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 10 years ago.
Just an example code from C++ Primer 5th Edition: 17.3.3. Using the Regular Expression Library
Main file main.cpp:
#include <iostream>
#include "regexcase.h"
using namespace std;
int main() {
using_regex();
return 0;
}
Header file regexcase.h:
#ifndef REGEXCASE_H_
#define REGEXCASE_H_
#include <regex>
#include <string>
void using_regex();
std::string parseCode(std::regex_constants::error_type etype);
#endif /* REGEXCASE_H_ */
Source file regexcase.cpp:
#include "regexcase.h"
#include <iostream>
using namespace std;
void using_regex() {
// look for words that violate a well-known spelling rule of thumb, "i before e, except after c":
// find the characters ei that follow a character other than c
string pattern("[^c]ei");
// we want the whole word in which our pattern appears
pattern = "[a-zA-Z]*" + pattern + "[a-zA-Z]*"; //[a-zA-Z]* [[:alpha:]]*
try {
regex r(pattern, regex_constants::extended); // construct a regex to find pattern // , regex_constants::extended
smatch results; // define an object to hold the results of a search
// define a string that has text that does and doesn't match pattern
string test_str = "receipt freind theif receive";
// use r to find a match to pattern in test_str
if (regex_search(test_str, results, r)) // if there is a match
cout << results.str() << endl; // print the matching word
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
string parseCode(regex_constants::error_type etype) {
switch (etype) {
case regex_constants::error_collate:
return "error_collate: invalid collating element request";
case regex_constants::error_ctype:
return "error_ctype: invalid character class";
case regex_constants::error_escape:
return "error_escape: invalid escape character or trailing escape";
case regex_constants::error_backref:
return "error_backref: invalid back reference";
case regex_constants::error_brack:
return "error_brack: mismatched bracket([ or ])";
case regex_constants::error_paren:
return "error_paren: mismatched parentheses(( or ))";
case regex_constants::error_brace:
return "error_brace: mismatched brace({ or })";
case regex_constants::error_badbrace:
return "error_badbrace: invalid range inside a { }";
case regex_constants::error_range:
return "erro_range: invalid character range(e.g., [z-a])";
case regex_constants::error_space:
return "error_space: insufficient memory to handle this regular expression";
case regex_constants::error_badrepeat:
return "error_badrepeat: a repetition character (*, ?, +, or {) was not preceded by a valid regular expression";
case regex_constants::error_complexity:
return "error_complexity: the requested match is too complex";
case regex_constants::error_stack:
return "error_stack: insufficient memory to evaluate a match";
default:
return "";
}
}
The output of calling using_regex(); is what: regex_error; code: error_brack: mismatched bracket([ or ])
It seems that the regex can't parse the bracket.
Refer to Answers in this question, I use regex_constants::extended to initialize the regex object, which then is regex r(pattern, regex_constants::extended);
Then the output is no match for [[:alpha:]]*[^c]ei[[:alpha:]]*
It seems that the regex can't match the pattern.
Then I use [a-zA-Z]* to replace character class [[:alpha:]]* (with regex_constants::extended still set). The output still is no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
Platform: windows
Tools used: Eclipse for C/C++; MinGW (g++ --version: g++ 4.7.2)
EDIT:
Thanks #sharth, add main file to complete the code.
I just did a test using libc++ and clang++. This works as expected. Here's my main:
int main() {
string test_str = "receipt freind theif receive";
string pattern = "[a-zA-Z]*[^c]ei[a-zA-Z]*";
try {
regex r(pattern, regex_constants::extended);
smatch results;
if (regex_search(test_str, results, r))
cout << results.str() << endl;
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
Output:
freind
On the other hand GCC 4.7.2, gives this result:
no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
This is because in GCC 4.7.2's libstdc++, they still don't implement regex. Here's the implementation of regex_search:
template<typename _Bi_iter, typename _Allocator, typename _Ch_type, typename _Rx_traits>
inline bool regex_search(_Bi_iter __first, _Bi_iter __last, match_results<_Bi_iter, _Allocator>& __m, const basic_regex<_Ch_type, _Rx_traits>& __re, regex_constants::match_flag_type __flags) {
return false;
}
And just to note, it is very helpful to include a small program that readers could compile. That way there is no confusion about what code is being run.