Is there any way I can create a regex pattern with that contains an unsigned char? I've tried:
regex* r = new regex("\\xff");
which results in an exception saying the pattern character is out of range. I've also tried to define my own basic_regex and my own regex_traits following the code in the regex include file but that results in a strange error in the local include.
Any help would be appreciated.
This is a legal HexEscapeSequence in ECMAScript regex (which is what C++ uses by default), and it appears to work on my tests:
#include <iostream>
#include <regex>
int main()
{
std::regex re("\\xff");
std::string test = "abc\xff";
std::smatch m;
regex_search(test, m, re);
std::cout << "Found " << m.size() << " match after '"
<< m.prefix() << "'\n";
}
clang++/libc++: Found 1 match after 'abc'
g++/boost.regex (replacing std:: with boost::): Found 1 match after 'abc'
what's your implementation?
Related
I'm a bit confused about the following C++11 code:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string haystack("abcdefabcghiabc");
std::regex needle("abc");
std::smatch matches;
std::regex_search(haystack, matches, needle);
std::cout << matches.size() << std::endl;
}
I'd expect it to print out 3 but instead I get 1. Am I missing something?
You get 1 because regex_search returns only 1 match, and size() will return the number of capture groups + the whole match value.
Your matches is...:
Object of a match_results type (such as cmatch or smatch) that is filled by this function with information about the match results and any submatches found.
If [the regex search is] successful, it is not empty and contains a series of sub_match objects: the first sub_match element corresponds to the entire match, and, if the regex expression contained sub-expressions to be matched (i.e., parentheses-delimited groups), their corresponding sub-matches are stored as successive sub_match elements in the match_results object.
Here is a code that will find multiple matches:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
string str("abcdefabcghiabc");
int i = 0;
regex rgx1("abc");
smatch smtch;
while (regex_search(str, smtch, rgx1)) {
std::cout << i << ": " << smtch[0] << std::endl;
i += 1;
str = smtch.suffix().str();
}
return 0;
}
See IDEONE demo returning abc 3 times.
As this method destroys the input string, here is another alternative based on the std::sregex_iterator (std::wsregex_iterator should be used when your subject is an std::wstring object):
int main() {
std::regex r("ab(c)");
std::string s = "abcdefabcghiabc";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Match value: " << m.str() << " at Position " << m.position() << '\n';
std::cout << " Capture: " << m[1].str() << " at Position " << m.position(1) << '\n';
}
return 0;
}
See IDEONE demo, returning
Match value: abc at Position 0
Capture: c at Position 2
Match value: abc at Position 6
Capture: c at Position 8
Match value: abc at Position 12
Capture: c at Position 14
What you're missing is that matches is populated with one entry for each capture group (including the entire matched substring as the 0th capture).
If you write
std::regex needle("a(b)c");
then you'll get matches.size()==2, with matches[0]=="abc", and matches[1]=="b".
EDIT: Some people have downvoted this answer. That may be for a variety of reasons, but if it is because it does not apply to the answer I criticized (no one left a comment to explain the decision), they should take note that W. Stribizew changed the code two months after I wrote this, and I was unaware of it until today, 2021-01-18. The rest of the answer is unchanged from when I first wrote it.
#stribizhev's solution has quadratic worst case complexity for sane regular expressions. For insane ones (e.g. "y*"), it doesn't terminate. In some applications, these issues could be DoS attacks waiting to happen. Here's a fixed version:
string str("abcdefabcghiabc");
int i = 0;
regex rgx1("abc");
smatch smtch;
auto beg = str.cbegin();
while (regex_search(beg, str.cend(), smtch, rgx1)) {
std::cout << i << ": " << smtch[0] << std::endl;
i += 1;
if ( smtch.length(0) > 0 )
std::advance(beg, smtch.length(0));
else if ( beg != str.cend() )
++beg;
else
break;
}
According to my personal preference, this will find n+1 matches of an empty regex in a string of length n. You might also just exit the loop after an empty match.
If you want to compare the performance for a string with millions of matches, add the following lines after the definition of str (and don't forget to turn on optimizations), once for each version:
for (int j = 0; j < 20; ++j)
str = str + str;
This is alist of static strings, it only uses wildcard at begin or end of the string. No any other regex rules.
AAAA, BBBB*, *CCCC, *DDDD* .
I need to find a given string match any of the string in this list. I'm looking for something like this.
bool isMatch(std::string str)
{
std::vector<string> my_list = {AAAA, BBBB*, *CCCC, *DDDD*};
if(str.matchAny(my_list))
return true;
return false;
}
I don't like to uses any 3rd parties like boost. Thinking this can be achieve by C++11 std::regex? Or is there any other simple way?
A regular expression would be overkill here. Just look for each of the character sequences in the appropriate place:
str == "AAAA"
str.find("BBBB") == 0
str.find("CCCC") == str.size() - 4
str.find("DDDD") != std::string::npos
Here's how I've usually done it, I replace "\\*" with ".*" and "\\?" with ".".
Here's the C++ code for it.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
regex star_replace("\\*");
regex questionmark_replace("\\?");
string data = "AAAABBBCCDDDD";
string pattern = "*CC*";
auto wildcard_pattern = regex_replace(
regex_replace(pattern, star_replace, ".*"),
questionmark_replace, ".");
cout << "Wildcard: " << pattern << " Regex: " << wildcard_pattern << endl;
regex wildcard_regex("^" + wildcard_pattern + "$");
if (regex_match(data, wildcard_regex))
cout << "Match!" << endl;
else
cout << "No match!" << endl;
return 0;
}
Here's a link to runnable code on onlinegdb
I am trying to match all files that have the extension .nef - The match must be case insensitive.
regex e("(.*)(\\.NEF)",ECMAScript|icase);
...
if (regex_match ( fn1, e )){
//Do Something
}
here fn1 is a string with a file name.
However, this "does something" only with files with .NEF (upper case) extensions. .nef extensions are ignored.
I also tried -
regex e("(.*)(\\.[Nn][Ee][Ff])");
and
regex e("(.*)(\\.[N|n][E|e][F|f])");
both of which resulted in a runtime error.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)
My code is compiled using -
g++ nefread.cpp -o nefread -lraw_r -lpthread -pthread -std=c++11 -O3
What am I doing wrong? This is my basic code. I want to extend it to match more file extensions .nef, .raw, .cr2 etc.
Your original expression is correct, and should produce the desired result. The problem is with the gcc implementation of <regex>, which is broken. This answer explains the historical reasons why it so, and also says that gcc4.9 will ship with a working <regex> implementation.
Your code works using Boost.Regex
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
// Simple regular expression matching
boost::regex expr(R"((.*)\.(nef))", boost::regex_constants::ECMAScript |
boost::regex_constants::icase);
// ^^^ ^^
// no need escape the '\' if you use raw string literals
boost::cmatch m;
for (auto const& fname : {"foo.nef", "bar.NeF", "baz.NEF"}) {
if(boost::regex_match(fname, m, expr)) {
std::cout << "matched: " << m[0] << '\n';
std::cout << " " << m[1] << '\n';
std::cout << " " << m[2] << '\n';
}
}
}
Live demo
This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 9 years ago.
I want to parse a token that looks like this:
1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=
I use a regular expression ([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}), and it does the job when I try it on refiddle.
Then I try it with C++:
std::regex regexp(R"(([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}))",
std::regex_constants::basic);
std::smatch match;
if (std::regex_search(stringified, match, regexp)) {
cout << match[0] << ',' << match[1] << ',' << match[2] << endl;
} else {
cout << "No matches found" << endl;
}
I compile it on Ubuntu 13.10 x64 using GCC 4.8.1 with -std=c++11 flag. But I always get No matches found. What am I doing wrong?
You were specifying POSIX basic regex, in that format you must escape () and {}
I was able to get get matches with a few changes:
int main(int argc, const char * argv[]){
using std::cout;
using std::endl;
std::regex regexp(R"(\([0-9]\{16\}\):\([0-9]\{5,20\}\):\([a-zA-Z0-9\\+/=]\{28\}\))",std::regex_constants::basic);
std::smatch match;
std::string stringified = "1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=";
if (std::regex_search(stringified, match, regexp)) {
cout << match[1] << "," << match[2] << "," << match[3]<< endl;
} else {
cout << "No matches found" << endl;
}
return 0;
}
Or you could use:
std::regex_constants::extended
If you use std::regex_constants::extended you should not escape () and {}
If you don't want to use a raw string, you can do that as well:
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})",std::regex_constants::extended);
You'll just have to double up on the \\ to properly escape them. The above regex also works with the default regex grammar std::regex_constants::ECMAScript
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})");
It looks like GCC just added regex supported in their development branch of GCC 4.9.
It appears that you need to use 'extended' syntax. Change regex_constants::basic to regex_constants::extended and it will match.
You need extended syntax in order to perform capturing.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 10 years ago.
Just an example code from C++ Primer 5th Edition: 17.3.3. Using the Regular Expression Library
Main file main.cpp:
#include <iostream>
#include "regexcase.h"
using namespace std;
int main() {
using_regex();
return 0;
}
Header file regexcase.h:
#ifndef REGEXCASE_H_
#define REGEXCASE_H_
#include <regex>
#include <string>
void using_regex();
std::string parseCode(std::regex_constants::error_type etype);
#endif /* REGEXCASE_H_ */
Source file regexcase.cpp:
#include "regexcase.h"
#include <iostream>
using namespace std;
void using_regex() {
// look for words that violate a well-known spelling rule of thumb, "i before e, except after c":
// find the characters ei that follow a character other than c
string pattern("[^c]ei");
// we want the whole word in which our pattern appears
pattern = "[a-zA-Z]*" + pattern + "[a-zA-Z]*"; //[a-zA-Z]* [[:alpha:]]*
try {
regex r(pattern, regex_constants::extended); // construct a regex to find pattern // , regex_constants::extended
smatch results; // define an object to hold the results of a search
// define a string that has text that does and doesn't match pattern
string test_str = "receipt freind theif receive";
// use r to find a match to pattern in test_str
if (regex_search(test_str, results, r)) // if there is a match
cout << results.str() << endl; // print the matching word
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
string parseCode(regex_constants::error_type etype) {
switch (etype) {
case regex_constants::error_collate:
return "error_collate: invalid collating element request";
case regex_constants::error_ctype:
return "error_ctype: invalid character class";
case regex_constants::error_escape:
return "error_escape: invalid escape character or trailing escape";
case regex_constants::error_backref:
return "error_backref: invalid back reference";
case regex_constants::error_brack:
return "error_brack: mismatched bracket([ or ])";
case regex_constants::error_paren:
return "error_paren: mismatched parentheses(( or ))";
case regex_constants::error_brace:
return "error_brace: mismatched brace({ or })";
case regex_constants::error_badbrace:
return "error_badbrace: invalid range inside a { }";
case regex_constants::error_range:
return "erro_range: invalid character range(e.g., [z-a])";
case regex_constants::error_space:
return "error_space: insufficient memory to handle this regular expression";
case regex_constants::error_badrepeat:
return "error_badrepeat: a repetition character (*, ?, +, or {) was not preceded by a valid regular expression";
case regex_constants::error_complexity:
return "error_complexity: the requested match is too complex";
case regex_constants::error_stack:
return "error_stack: insufficient memory to evaluate a match";
default:
return "";
}
}
The output of calling using_regex(); is what: regex_error; code: error_brack: mismatched bracket([ or ])
It seems that the regex can't parse the bracket.
Refer to Answers in this question, I use regex_constants::extended to initialize the regex object, which then is regex r(pattern, regex_constants::extended);
Then the output is no match for [[:alpha:]]*[^c]ei[[:alpha:]]*
It seems that the regex can't match the pattern.
Then I use [a-zA-Z]* to replace character class [[:alpha:]]* (with regex_constants::extended still set). The output still is no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
Platform: windows
Tools used: Eclipse for C/C++; MinGW (g++ --version: g++ 4.7.2)
EDIT:
Thanks #sharth, add main file to complete the code.
I just did a test using libc++ and clang++. This works as expected. Here's my main:
int main() {
string test_str = "receipt freind theif receive";
string pattern = "[a-zA-Z]*[^c]ei[a-zA-Z]*";
try {
regex r(pattern, regex_constants::extended);
smatch results;
if (regex_search(test_str, results, r))
cout << results.str() << endl;
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
Output:
freind
On the other hand GCC 4.7.2, gives this result:
no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
This is because in GCC 4.7.2's libstdc++, they still don't implement regex. Here's the implementation of regex_search:
template<typename _Bi_iter, typename _Allocator, typename _Ch_type, typename _Rx_traits>
inline bool regex_search(_Bi_iter __first, _Bi_iter __last, match_results<_Bi_iter, _Allocator>& __m, const basic_regex<_Ch_type, _Rx_traits>& __re, regex_constants::match_flag_type __flags) {
return false;
}
And just to note, it is very helpful to include a small program that readers could compile. That way there is no confusion about what code is being run.