Bug in std::regex? - c++

Here is code :
#include <string>
#include <regex>
#include <iostream>
int main()
std::string pattern("[^c]ei");
pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
std::regex r(pattern);
std::smatch results;
std::string test_str = "cei";
if (std::regex_search(test_str, results, r))
std::cout << results.str() << std::endl;
return 0;
Output :
The compiler used is gcc 4.9.1.
I'm a newbie learning regular expression.I expected nothing should be output,since "cei" doesn't match the pattern here. Am I doing it right? What's the problem?
This one has been reported and confirmed as a bug, for detail please visit here :

It's a bug in the implementation. Not only do a couple other tools I tried agree that your pattern does not match your input, but I tried this:
#include <string>
#include <regex>
#include <iostream>
int main()
std::string pattern("([a-z]*)([a-z])(e)(i)([a-z]*)");
std::regex r(pattern);
std::smatch results;
std::string test_str = "cei";
if (std::regex_search(test_str, results, r))
std::cout << results.str() << std::endl;
for (size_t i = 0; i < results.size(); ++i) {
std::ssub_match sub_match = results[i];
std::string sub_match_str = sub_match.str();
std::cout << i << ": " << sub_match_str << '\n';
This is basically similar to what you had, but I replaced [:alpha:] with [a-z] for simplicity, and I also temporarily replaced [^c] with [a-z] because that seems to make it work correctly. Here's what it prints (GCC 4.9.0 on Linux x86-64):
0: cei
2: c
3: e
4: i
If I replace [a-z] where you had [^c] and just put f there instead, it correctly says the pattern doesn't match. But if I use [^c] like you did:
std::string pattern("([a-z]*)([^c])(e)(i)([a-z]*)");
Then I get this output:
0: cei
1: cei
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
Aborted (core dumped)
So it claims to match successfully, and results[0] is "cei" which is expected. Then, results[1] is "cei" also, which I guess might be OK. But then results[2] crashes, because it tries to construct a std::string of length 18446744073709551614 with begin=nullptr. And that giant number is exactly 2^64 - 2, aka std::string::npos - 1 (on my system).
So I think there is an off-by-one error somewhere, and the impact can be much more than just a spurious regex match--it can crash at runtime.

The regex is correct and should not match the string "cei".
The regex can be tested and explained best in Perl:
my $regex = qr{ # start regular expression
[[:alpha:]]* # 0 or any number of alpha chars
[^c] # followed by NOT-c character
ei # followed by e and i characters
[[:alpha:]]* # followed by 0 or any number of alpha chars
}x; # end + declare 'x' mode (ignore whitespace)
print "xei" =~ /$regex/ ? "match\n" : "no match\n";
print "cei" =~ /$regex/ ? "match\n" : "no match\n";
The regex will first consume all chars to the end of the string ([[:alpha:]]*), then backtrack to find the NON-c char [^c] and proceed with the e and i matches (by backtracking another time).
"xei" --> match
"cei" --> no match
for obvious reasons. Any discrepancies to this in various C++ libraries and testing tools are the problem of the implementation there, imho.


Regex - How to capture all iterations of a repeating pattern? [duplicate]

I'm using the C++ tr1::regex with the ECMA regex grammar. What I'm trying to do is parse a header and return values associated with each item in the header.
-Testing some text
-Numbers 1 2 5
-MoreStuff some more text
-Numbers 1 10
What I would like to do is find all of the "-Numbers" lines and put each number into its own result with a single regex. As you can see, the "-Numbers" lines can have an arbitrary number of values on the line. Currently, I'm just searching for "-Numbers([\s0-9]+)" and then tokenizing that result. I was just wondering if there was any way to both find and tokenize the results in a single regex.
No, there is not.
I was about to ask this exact same question, and I kind of found a solution.
Let's say you have an arbitrary number of words you want to capture.
"there are four lights"
"captain picard is the bomb"
You might think that the solution is:
But this will only match the whole input string and the last captured group.
What you can do is use the "g" switch.
So, an example in Perl:
use strict;
use warnings;
my $str1 = "there are four lights";
my $str2 = "captain picard is the bomb";
foreach ( $str1, $str2 ) {
my #a = ( $_ =~ /(\w+)\s?/g );
print "captured groups are: " . join( "|", #a ) . "\n";
Output is:
captured groups are: there|are|four|lights
captured groups are: captain|picard|is|the|bomb
So, there is a solution if your language of choice supports an equivalent of "g" (and I guess most do...).
Hope this helps someone who was in the same position as me!
Problem is that desired solution insists on use of capture groups. C++ provides tool regex_token_iterator to handle this in better way (C++11 example):
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::regex e (R"((?:^-Numbers)?\s*(\d+))");
string input;
while (getline(cin, input)) {
std::regex_token_iterator<std::string::iterator> a{
input.begin(), input.end(),
e, 1,
std::regex_token_iterator<std::string::iterator> end;
while (a != end) {
cout << *a << " - ";
cout << '\n';
return 0;

std::regex_match and lazy quantifier with strange behavior

I know that:
Lazy quantifier matches: As Few As Possible (shortest match)
Also know that the constructor:
basic_regex( ...,
flag_type f = std::regex_constants::ECMAScript );
ECMAScript supports non-greedy matches,
and the ECMAScript regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ...
At most one grammar option must be chosen out of ECMAScript,
basic, extended, awk, grep, egrep. If no grammar is chosen,
ECMAScript is assumed to be selected ...
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences...std::regex_match
Here is my code: + Live
#include <iostream>
#include <string>
#include <regex>
int main(){
std::string string( "s/one/two/three/four/five/six/g" );
std::match_results< std::string::const_iterator > match;
std::basic_regex< char > regex ( "s?/.+?/g?" ); // non-greedy
bool test = false;
using namespace std::regex_constants;
// okay recognize the lazy operator .+?
test = std::regex_search( string, match, regex );
std::cout << test << '\n';
std::cout << match.str() << '\n';
// does not recognize the lazy operator .+?
test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
std::cout << test << '\n';
std::cout << match.str() << '\n';
and the output:
Process returned 0 (0x0) execution time : 0.008 s
Press ENTER to continue.
std::regex_match should not match anything and it should return 0 with non-greedy quantifier .+?
In fact, here, the non-greedy .+? quantifier has the same meaning as greedy one, and both /.+?/ and /.+/ match the same string. They are different patterns.
So the problem is why the question mark is ignored?
Fast test:
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+?\/g?/ && print $&'
$ s/one/
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+\/g?/ && print $&'
$ s/one/two/three/four/five/six/g
this regex: std::basic_regex< char > regex ( "s?/.+?/g?" ); non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" ); greedy
have the same output with std::regex_match. Still both match the entire of the string!
But with std::regex_search have the different output.
Also s? or g? does not matter and with /.*?/ still matches the entire of the string!
More Detail
g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901
I don't see any inconsistency. regex_match tries to match the whole string, so s?/.+?/g? lazily expands till the whole string is covered.
These "diagrams" (for regex_search) will hopefully help to get the idea of greediness:
a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".
a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa| # try .* == "baba" first
# backtrack
a.*|a: abab|a # try .* == "bab" now
a.*a|: ababa|
And regex_match( abc ) is like regex_search( ^abc$ ) in this case.

C++11 regex matching capturing group multiple times

Could someone please help me to extract the text between the : and the ^ symbols using a JavaScript (ECMAScript) regular expression in C++11. I do not need to capture the hw-descriptor itself - but it does have to be present in the line in order for the rest of the line to be considered for a match. Also the :p....^, :m....^ and :u....^ can arrive in any order and there has to be at least 1 present.
I tried using the following regular expression:
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
against the following text line:
Here is the code which posted on a live coliru. It shows how I attempted to solve this problem, however I am only getting 1 match. I need to see how to extract each of the potential 3 matches corresponding to the p m or u characters described earlier.
#include <iostream>
#include <string>
#include <vector>
#include <regex>
int main()
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^";
// I seem to only get 1 match here, I was expecting
// to loop through each of the matches, looks like I need something like
// a pcre global option but I don't know how.
std::for_each(std::sregex_iterator(foo.cbegin(), foo.cend(), gRegex), std::sregex_iterator(),
[&](const auto& rMatch) {
for (int i=0; i< static_cast<int>(rMatch.size()); ++i) {
std::cout << rMatch[i] << std::endl;
The above program gives the following output:
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
With std::regex, you cannot keep mutliple repeated captures when matching a certain string with consecutive repeated patterns.
What you may do is to match the overall texts containing the prefix and the repeated chunks, capture the latter into a separate group, and then use a second smaller regex to grab all the occurrences of the substrings you want separately.
The first regex here may be
See the online demo. It will match hw-descriptor and ((?::[pmu][^^]*\\^)+) will capture into Group 1 one or more repetitions of :[pmu][^^]*\^ pattern: :, p/m/u, 0 or more chars other than ^ and then ^. Upon finding a match, use :[pmu][^^]*\^ regex to return all the real "matches".
C++ demo:
static const std::regex gRegex("hw-descriptor((?::[pmu][^^]*\\^)+)", std::regex::icase);
static const std::regex lRegex(":[pmu][^^]*\\^", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^ hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^";
std::smatch smtch;
for(std::sregex_iterator i = std::sregex_iterator(foo.begin(), foo.end(), gRegex);
i != std::sregex_iterator();
std::smatch m = *i;
std::cout << "Match value: " << m.str() << std::endl;
std::string x = m.str(1);
for(std::sregex_iterator j = std::sregex_iterator(x.begin(), x.end(), lRegex);
j != std::sregex_iterator();
std::cout << "Element value: " << (*j).str() << std::endl;
Match value: hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
Element value: :pTEXT1^
Element value: :mTEXT2^
Element value: :uTEXT3^
Match value: hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^
Element value: :pTEXT8^
Element value: :mTEXT8^
Element value: :uTEXT83^

Regular expression validation fails while egrep validates just fine

I'm trying to use regular expressions in order to validate strings so before I go any further let me explain first how the strings looks like: optional number of digits followed by an 'X' and an optional ('^' followed by one or more digits).
Here are some exmaples: "2X", "X", "23X^6" fit the pattern while strings like "X^", "4", "foobar", "4X^", "4X44" don't.
Now where was I: using 'egrep' and the "^[0-9]{0,}\X(\^[0-9]{1,})$" regex I can validate just fine those strings however when trying this in C++ using the C++11 regex library it fails.
Here's the code I'm using to validate those strings:
#include <iostream>
#include <regex>
#include <string>
#include <vector>
int main()
std::regex r("^[0-9]{0,}\\X(\\^[0-9]{1,})$",
std::vector<std::string> challanges_ok {"2X", "X", "23X^66", "23X^6",
"3123X", "2313131X^213213123"};
std::vector<std::string> challanges_bad {"X^", "4", "asdsad", " X",
"4X44", "4X^"};
std::cout << "challanges_ok: ";
for (auto &str : challanges_ok) {
std::cout << std::regex_match(str, r) << " ";
std::cout << "\nchallanges_bad: ";
for (auto &str : challanges_bad) {
std::cout << std::regex_match(str, r) << " ";
std::cout << "\n";
return 0;
Am I doing something wrong or am I missing something? I'm compiling under GCC 4.7.
Your regex fails to make the '^' followed by one or more digits optional; change it to:
Also note that this page says that GCC's support of <regex> is only partial, so std::regex may not work at all for you ('partial' in this context apparently means 'broken'); have you tried Boost.Xpressive or Boost.Regex as a sanity check?
optional number of digits followed by an 'X' and an optional ('^' followed by one or more digits).
OK, the regular expression in your code doesn't match that description, for two reasons: you have an extra backslash on the X, and the '^digits' part is not optional. The regex you want is this:
which means your grep command should look like this (note single quotes):
egrep '^[0-9]{0,}X(\^[0-9]{1,}){0,1}$' filename
And the string you have to pass in your C++ code is this:
If you then replace all the explicit quantifiers with their more traditional abbreviations, you get #ildjarn's answer: {0,} is *, {1,} is +, and {0,1} is ?.

