C++11 regex doesn't match the string [duplicate] - c++

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 9 years ago.
I want to parse a token that looks like this:
1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=
I use a regular expression ([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}), and it does the job when I try it on refiddle.
Then I try it with C++:
std::regex regexp(R"(([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}))",
std::regex_constants::basic);
std::smatch match;
if (std::regex_search(stringified, match, regexp)) {
cout << match[0] << ',' << match[1] << ',' << match[2] << endl;
} else {
cout << "No matches found" << endl;
}
I compile it on Ubuntu 13.10 x64 using GCC 4.8.1 with -std=c++11 flag. But I always get No matches found. What am I doing wrong?

You were specifying POSIX basic regex, in that format you must escape () and {}
I was able to get get matches with a few changes:
int main(int argc, const char * argv[]){
using std::cout;
using std::endl;
std::regex regexp(R"(\([0-9]\{16\}\):\([0-9]\{5,20\}\):\([a-zA-Z0-9\\+/=]\{28\}\))",std::regex_constants::basic);
std::smatch match;
std::string stringified = "1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=";
if (std::regex_search(stringified, match, regexp)) {
cout << match[1] << "," << match[2] << "," << match[3]<< endl;
} else {
cout << "No matches found" << endl;
}
return 0;
}
Or you could use:
std::regex_constants::extended
If you use std::regex_constants::extended you should not escape () and {}
If you don't want to use a raw string, you can do that as well:
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})",std::regex_constants::extended);
You'll just have to double up on the \\ to properly escape them. The above regex also works with the default regex grammar std::regex_constants::ECMAScript
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})");
It looks like GCC just added regex supported in their development branch of GCC 4.9.

It appears that you need to use 'extended' syntax. Change regex_constants::basic to regex_constants::extended and it will match.
You need extended syntax in order to perform capturing.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04

Related

PCRE does not match

I suppose it's something very stupid, however this does not match, and I have no idea why.
I compiles successfully and everything, but it just doesn't match.
I've already used RE(".*") but it doesn't work as well.
The system is OS X (installed pcre using brew).
std::string s;
if (pcrecpp::RE("h.*o").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
You are trying to extract one subpattern (in &s), but have not included any parentheses to capture that subpattern. Try this (untested, note parentheses).
std::string s;
if (pcrecpp::RE("(h.*o)").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
The documentation at http://www.pcre.org/original/doc/html/pcrecpp.html has a similar example, stating:
Example: fails because there aren't enough sub-patterns:
!pcrecpp::RE("\w+:\d+").FullMatch("ruby:1234", &s);

C++ (C++11) regular expressions differences on OS X and Linux [duplicate]

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 7 years ago.
I'm trying to get my code to work on both OS X and Linux the same.
The code below is compiled with clang++ --std=c++11 regextest.cpp
#include <regex>
#include <iostream>
int main()
{
std::string str = "/api/asd/";
std::string pattern = "/api/(.*)/";
std::cout << "Starting matching" << std::endl;
std::smatch matches;
if (std::regex_match(str, matches, std::regex(pattern, std::regex::egrep)))
{
std::cout << "Found match!" << std::endl;
std::cout << "All matches: ";
for (auto& it : matches)
std::cout << it << ", ";
std::cout << std::endl;
}
return 0;
}
On OS X, the result of running this code is:
Starting matching
Found match!
All matches: /api/asd/, asd,
On Linux, on the other hand (Gentoo, libstdc++ 3.3)
Starting matching
Found match!
All matches: /api/asd/, /asd/, //
How does it match /api/ on Linux? Why?
Additionally, trying to use a pattern like /api/([^/]) fails completely in Linux and matches nothing but works well in OS X.
I've tried many combinations of match types, (basic, extended, grep, egrep, awk) with escaped and unescaped ( and ) (depending on the match type) and nothing produces the expected results on Linux.
As suggested by the comments, this issue was solved by upgrading gcc to 4.9. (~amd64 flag currently required to do this on Gentoo).

c++ regex search pattern not found

Following the example here I wrote following code:
using namespace std::regex_constants;
std::string str("{trol,asdfsad},{safsa, aaaaa,aaaaadfs}");
std::smatch m;
std::regex r("\\{(.*)\\}"); // matches anything between {}
std::cout << "Initiating search..." << std::endl;
while (std::regex_search(str, m, r)) {
for (auto x : m) {
std::cout << x << " ";
}
std::cout << std::endl;
str = m.suffix().str();
}
But to my surprise, it doesn't find anything at all which I fail to understand. I would understand if the regex matches whole string since .* is greedy but nothing at all? What am I doing wrong here?
To be clear - I know that regexes are not suitable for Parsing BUT I won't deal with more levels of bracket nesting and therefore I find usage of regexes good enough.
If you want to use basic posix syntax, your regex should be
{\\(.*\\)}
If you want to use default ECMAScript, your regex should be
\\{(.*)\\}
with clang and libc++ or with gcc 4.9+ (since only it fully support regex) your code give:
Initiating search...
{trol,asdfsad},{safsa, aaaaa,aaaaadfs} trol,asdfsad},{safsa, aaaaa,aaaaadfs
Live example on coliru
Eventually it turned out to really be problem with gcc version so I finally got it working using boost::regex library and following code:
std::string str("{trol,asdfsad},{safsa,aaaaa,aaaaadfs}");
boost::regex rex("\\{(.*?)\\}", boost::regex_constants::perl);
boost::smatch result;
while (boost::regex_search(str, result, rex)) {
for (uint i = 0; i < result.size(); ++i) {
std::cout << result[i] << " ";
}
std::cout << std::endl;
str = result.suffix().str();
}

Define regexp pattern as unsigned char

Is there any way I can create a regex pattern with that contains an unsigned char? I've tried:
regex* r = new regex("\\xff");
which results in an exception saying the pattern character is out of range. I've also tried to define my own basic_regex and my own regex_traits following the code in the regex include file but that results in a strange error in the local include.
Any help would be appreciated.
This is a legal HexEscapeSequence in ECMAScript regex (which is what C++ uses by default), and it appears to work on my tests:
#include <iostream>
#include <regex>
int main()
{
std::regex re("\\xff");
std::string test = "abc\xff";
std::smatch m;
regex_search(test, m, re);
std::cout << "Found " << m.size() << " match after '"
<< m.prefix() << "'\n";
}
clang++/libc++: Found 1 match after 'abc'
g++/boost.regex (replacing std:: with boost::): Found 1 match after 'abc'
what's your implementation?

c++ 11 regex error [duplicate]

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 10 years ago.
Just an example code from C++ Primer 5th Edition: 17.3.3. Using the Regular Expression Library
Main file main.cpp:
#include <iostream>
#include "regexcase.h"
using namespace std;
int main() {
using_regex();
return 0;
}
Header file regexcase.h:
#ifndef REGEXCASE_H_
#define REGEXCASE_H_
#include <regex>
#include <string>
void using_regex();
std::string parseCode(std::regex_constants::error_type etype);
#endif /* REGEXCASE_H_ */
Source file regexcase.cpp:
#include "regexcase.h"
#include <iostream>
using namespace std;
void using_regex() {
// look for words that violate a well-known spelling rule of thumb, "i before e, except after c":
// find the characters ei that follow a character other than c
string pattern("[^c]ei");
// we want the whole word in which our pattern appears
pattern = "[a-zA-Z]*" + pattern + "[a-zA-Z]*"; //[a-zA-Z]* [[:alpha:]]*
try {
regex r(pattern, regex_constants::extended); // construct a regex to find pattern // , regex_constants::extended
smatch results; // define an object to hold the results of a search
// define a string that has text that does and doesn't match pattern
string test_str = "receipt freind theif receive";
// use r to find a match to pattern in test_str
if (regex_search(test_str, results, r)) // if there is a match
cout << results.str() << endl; // print the matching word
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
string parseCode(regex_constants::error_type etype) {
switch (etype) {
case regex_constants::error_collate:
return "error_collate: invalid collating element request";
case regex_constants::error_ctype:
return "error_ctype: invalid character class";
case regex_constants::error_escape:
return "error_escape: invalid escape character or trailing escape";
case regex_constants::error_backref:
return "error_backref: invalid back reference";
case regex_constants::error_brack:
return "error_brack: mismatched bracket([ or ])";
case regex_constants::error_paren:
return "error_paren: mismatched parentheses(( or ))";
case regex_constants::error_brace:
return "error_brace: mismatched brace({ or })";
case regex_constants::error_badbrace:
return "error_badbrace: invalid range inside a { }";
case regex_constants::error_range:
return "erro_range: invalid character range(e.g., [z-a])";
case regex_constants::error_space:
return "error_space: insufficient memory to handle this regular expression";
case regex_constants::error_badrepeat:
return "error_badrepeat: a repetition character (*, ?, +, or {) was not preceded by a valid regular expression";
case regex_constants::error_complexity:
return "error_complexity: the requested match is too complex";
case regex_constants::error_stack:
return "error_stack: insufficient memory to evaluate a match";
default:
return "";
}
}
The output of calling using_regex(); is what: regex_error; code: error_brack: mismatched bracket([ or ])
It seems that the regex can't parse the bracket.
Refer to Answers in this question, I use regex_constants::extended to initialize the regex object, which then is regex r(pattern, regex_constants::extended);
Then the output is no match for [[:alpha:]]*[^c]ei[[:alpha:]]*
It seems that the regex can't match the pattern.
Then I use [a-zA-Z]* to replace character class [[:alpha:]]* (with regex_constants::extended still set). The output still is no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
Platform: windows
Tools used: Eclipse for C/C++; MinGW (g++ --version: g++ 4.7.2)
EDIT:
Thanks #sharth, add main file to complete the code.
I just did a test using libc++ and clang++. This works as expected. Here's my main:
int main() {
string test_str = "receipt freind theif receive";
string pattern = "[a-zA-Z]*[^c]ei[a-zA-Z]*";
try {
regex r(pattern, regex_constants::extended);
smatch results;
if (regex_search(test_str, results, r))
cout << results.str() << endl;
else
cout << "no match for " << pattern << endl;
} catch (regex_error &e) {
cout << "what: " << e.what() << "; code: " << parseCode(e.code()) << endl;
}
}
Output:
freind
On the other hand GCC 4.7.2, gives this result:
no match for [a-zA-Z]*[^c]ei[a-zA-Z]*
This is because in GCC 4.7.2's libstdc++, they still don't implement regex. Here's the implementation of regex_search:
template<typename _Bi_iter, typename _Allocator, typename _Ch_type, typename _Rx_traits>
inline bool regex_search(_Bi_iter __first, _Bi_iter __last, match_results<_Bi_iter, _Allocator>& __m, const basic_regex<_Ch_type, _Rx_traits>& __re, regex_constants::match_flag_type __flags) {
return false;
}
And just to note, it is very helpful to include a small program that readers could compile. That way there is no confusion about what code is being run.