c++ 11 regex error [duplicate] - c++

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 10 years ago.
Just an example code from C++ Primer 5th Edition: 17.3.3. Using the Regular Expression Library
Main file main.cpp:
#include <iostream>
#include "regexcase.h"
using namespace std;
int main() {
using_regex();
return 0;
}
Header file regexcase.h:
#ifndef REGEXCASE_H_
#define REGEXCASE_H_
#include <regex>
#include <string>
void using_regex();
std::string parseCode(std::regex_constants::error_type etype);
#endif /* REGEXCASE_H_ */
Source file regexcase.cpp:
#include "regexcase.h"
#include <iostream>
using namespace std;
void using_regex() {
// look for words that violate a well-known spelling rule of thumb, "i before e, except after c":
// find the characters ei that follow a character other than c
string pattern("[^c]ei");
// we want the whole word in which our pattern appears
pattern = "[a-zA-Z]*" + pattern + "[a-zA-Z]*"; //[a-zA-Z]* [[:alpha:]]*
try {
regex r(pattern, regex_constants::extended); // construct a regex to find pattern // , regex_constants::extended
smatch results; // define an object to hold the results of a search
// define a string that has text that does and doesn't match pattern
string test_str = "receipt freind theif receive";
// use r to find a match to pattern in test_str
if (regex_search(test_str, results, r)) // if there is a match
cout << results.str() << endl; // print the matching word
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
string parseCode(regex_constants::error_type etype) {
switch (etype) {
case regex_constants::error_collate:
return "error_collate: invalid collating element request";
case regex_constants::error_ctype:
return "error_ctype: invalid character class";
case regex_constants::error_escape:
return "error_escape: invalid escape character or trailing escape";
case regex_constants::error_backref:
return "error_backref: invalid back reference";
case regex_constants::error_brack:
return "error_brack: mismatched bracket([ or ])";
case regex_constants::error_paren:
return "error_paren: mismatched parentheses(( or ))";
case regex_constants::error_brace:
return "error_brace: mismatched brace({ or })";
case regex_constants::error_badbrace:
return "error_badbrace: invalid range inside a { }";
case regex_constants::error_range:
return "erro_range: invalid character range(e.g., [z-a])";
case regex_constants::error_space:
return "error_space: insufficient memory to handle this regular expression";
case regex_constants::error_badrepeat:
return "error_badrepeat: a repetition character (*, ?, +, or {) was not preceded by a valid regular expression";
case regex_constants::error_complexity:
return "error_complexity: the requested match is too complex";
case regex_constants::error_stack:
return "error_stack: insufficient memory to evaluate a match";
default:
return "";
}
}
The output of calling using_regex(); is what: regex_error; code: error_brack: mismatched bracket([ or ])
It seems that the regex can't parse the bracket.
Refer to Answers in this question, I use regex_constants::extended to initialize the regex object, which then is regex r(pattern, regex_constants::extended);
Then the output is no match for [[:alpha:]]*[^c]ei[[:alpha:]]*
It seems that the regex can't match the pattern.
Then I use [a-zA-Z]* to replace character class [[:alpha:]]* (with regex_constants::extended still set). The output still is no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
Platform: windows
Tools used: Eclipse for C/C++; MinGW (g++ --version: g++ 4.7.2)
EDIT:
Thanks #sharth, add main file to complete the code.

I just did a test using libc++ and clang++. This works as expected. Here's my main:
int main() {
string test_str = "receipt freind theif receive";
string pattern = "[a-zA-Z]*[^c]ei[a-zA-Z]*";
try {
regex r(pattern, regex_constants::extended);
smatch results;
if (regex_search(test_str, results, r))
cout << results.str() << endl;
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
Output:
freind
On the other hand GCC 4.7.2, gives this result:
no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
This is because in GCC 4.7.2's libstdc++, they still don't implement regex. Here's the implementation of regex_search:
template<typename _Bi_iter, typename _Allocator, typename _Ch_type, typename _Rx_traits>
inline bool regex_search(_Bi_iter __first, _Bi_iter __last, match_results<_Bi_iter, _Allocator>& __m, const basic_regex<_Ch_type, _Rx_traits>& __re, regex_constants::match_flag_type __flags) {
return false;
}
And just to note, it is very helpful to include a small program that readers could compile. That way there is no confusion about what code is being run.

Related

How to match string with wildcard using C++11 regex

This is alist of static strings, it only uses wildcard at begin or end of the string. No any other regex rules.
AAAA, BBBB*, *CCCC, *DDDD* .
I need to find a given string match any of the string in this list. I'm looking for something like this.
bool isMatch(std::string str)
{
std::vector<string> my_list = {AAAA, BBBB*, *CCCC, *DDDD*};
if(str.matchAny(my_list))
return true;
return false;
}
I don't like to uses any 3rd parties like boost. Thinking this can be achieve by C++11 std::regex? Or is there any other simple way?
A regular expression would be overkill here. Just look for each of the character sequences in the appropriate place:
str == "AAAA"
str.find("BBBB") == 0
str.find("CCCC") == str.size() - 4
str.find("DDDD") != std::string::npos
Here's how I've usually done it, I replace "\\*" with ".*" and "\\?" with ".".
Here's the C++ code for it.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
regex star_replace("\\*");
regex questionmark_replace("\\?");
string data = "AAAABBBCCDDDD";
string pattern = "*CC*";
auto wildcard_pattern = regex_replace(
regex_replace(pattern, star_replace, ".*"),
questionmark_replace, ".");
cout << "Wildcard: " << pattern << " Regex: " << wildcard_pattern << endl;
regex wildcard_regex("^" + wildcard_pattern + "$");
if (regex_match(data, wildcard_regex))
cout << "Match!" << endl;
else
cout << "No match!" << endl;
return 0;
}
Here's a link to runnable code on onlinegdb

PCRE does not match

I suppose it's something very stupid, however this does not match, and I have no idea why.
I compiles successfully and everything, but it just doesn't match.
I've already used RE(".*") but it doesn't work as well.
The system is OS X (installed pcre using brew).
std::string s;
if (pcrecpp::RE("h.*o").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
You are trying to extract one subpattern (in &s), but have not included any parentheses to capture that subpattern. Try this (untested, note parentheses).
std::string s;
if (pcrecpp::RE("(h.*o)").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
The documentation at http://www.pcre.org/original/doc/html/pcrecpp.html has a similar example, stating:
Example: fails because there aren't enough sub-patterns:
!pcrecpp::RE("\w+:\d+").FullMatch("ruby:1234", &s);

How do I create a case insensitive regex to match file extensions?

I am trying to match all files that have the extension .nef - The match must be case insensitive.
regex e("(.*)(\\.NEF)",ECMAScript|icase);
...
if (regex_match ( fn1, e )){
//Do Something
}
here fn1 is a string with a file name.
However, this "does something" only with files with .NEF (upper case) extensions. .nef extensions are ignored.
I also tried -
regex e("(.*)(\\.[Nn][Ee][Ff])");
and
regex e("(.*)(\\.[N|n][E|e][F|f])");
both of which resulted in a runtime error.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)
My code is compiled using -
g++ nefread.cpp -o nefread -lraw_r -lpthread -pthread -std=c++11 -O3
What am I doing wrong? This is my basic code. I want to extend it to match more file extensions .nef, .raw, .cr2 etc.
Your original expression is correct, and should produce the desired result. The problem is with the gcc implementation of <regex>, which is broken. This answer explains the historical reasons why it so, and also says that gcc4.9 will ship with a working <regex> implementation.
Your code works using Boost.Regex
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
// Simple regular expression matching
boost::regex expr(R"((.*)\.(nef))", boost::regex_constants::ECMAScript |
boost::regex_constants::icase);
// ^^^ ^^
// no need escape the '\' if you use raw string literals
boost::cmatch m;
for (auto const& fname : {"foo.nef", "bar.NeF", "baz.NEF"}) {
if(boost::regex_match(fname, m, expr)) {
std::cout << "matched: " << m[0] << '\n';
std::cout << " " << m[1] << '\n';
std::cout << " " << m[2] << '\n';
}
}
}
Live demo

C++11 regex doesn't match the string [duplicate]

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 9 years ago.
I want to parse a token that looks like this:
1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=
I use a regular expression ([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}), and it does the job when I try it on refiddle.
Then I try it with C++:
std::regex regexp(R"(([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}))",
std::regex_constants::basic);
std::smatch match;
if (std::regex_search(stringified, match, regexp)) {
cout << match[0] << ',' << match[1] << ',' << match[2] << endl;
} else {
cout << "No matches found" << endl;
}
I compile it on Ubuntu 13.10 x64 using GCC 4.8.1 with -std=c++11 flag. But I always get No matches found. What am I doing wrong?
You were specifying POSIX basic regex, in that format you must escape () and {}
I was able to get get matches with a few changes:
int main(int argc, const char * argv[]){
using std::cout;
using std::endl;
std::regex regexp(R"(\([0-9]\{16\}\):\([0-9]\{5,20\}\):\([a-zA-Z0-9\\+/=]\{28\}\))",std::regex_constants::basic);
std::smatch match;
std::string stringified = "1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=";
if (std::regex_search(stringified, match, regexp)) {
cout << match[1] << "," << match[2] << "," << match[3]<< endl;
} else {
cout << "No matches found" << endl;
}
return 0;
}
Or you could use:
std::regex_constants::extended
If you use std::regex_constants::extended you should not escape () and {}
If you don't want to use a raw string, you can do that as well:
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})",std::regex_constants::extended);
You'll just have to double up on the \\ to properly escape them. The above regex also works with the default regex grammar std::regex_constants::ECMAScript
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})");
It looks like GCC just added regex supported in their development branch of GCC 4.9.
It appears that you need to use 'extended' syntax. Change regex_constants::basic to regex_constants::extended and it will match.
You need extended syntax in order to perform capturing.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04

Define regexp pattern as unsigned char

Is there any way I can create a regex pattern with that contains an unsigned char? I've tried:
regex* r = new regex("\\xff");
which results in an exception saying the pattern character is out of range. I've also tried to define my own basic_regex and my own regex_traits following the code in the regex include file but that results in a strange error in the local include.
Any help would be appreciated.
This is a legal HexEscapeSequence in ECMAScript regex (which is what C++ uses by default), and it appears to work on my tests:
#include <iostream>
#include <regex>
int main()
{
std::regex re("\\xff");
std::string test = "abc\xff";
std::smatch m;
regex_search(test, m, re);
std::cout << "Found " << m.size() << " match after '"
<< m.prefix() << "'\n";
}
clang++/libc++: Found 1 match after 'abc'
g++/boost.regex (replacing std:: with boost::): Found 1 match after 'abc'
what's your implementation?