Boost.Regex oddity - c++

Does anyone have any idea why the following code would output "no match"?
boost::regex r(".*\\.");
std::string s("app.test");
if (boost::regex_match(s, r))
std::cout << "match" << std::endl;
else
std::cout << "no match" << std::endl;

I believe regex_match() matches against the entire string. Try regex_search() instead.
It would have worked with the following regex:
boost::regex r(".*\\..*");
and the regex_match() function. But again, regex_search() is what you're probably looking for.

Related

PCRE does not match

I suppose it's something very stupid, however this does not match, and I have no idea why.
I compiles successfully and everything, but it just doesn't match.
I've already used RE(".*") but it doesn't work as well.
The system is OS X (installed pcre using brew).
std::string s;
if (pcrecpp::RE("h.*o").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
You are trying to extract one subpattern (in &s), but have not included any parentheses to capture that subpattern. Try this (untested, note parentheses).
std::string s;
if (pcrecpp::RE("(h.*o)").FullMatch("hello", &s))
{
std::cout << "Successful match " << s << std::endl;
}
The documentation at http://www.pcre.org/original/doc/html/pcrecpp.html has a similar example, stating:
Example: fails because there aren't enough sub-patterns:
!pcrecpp::RE("\w+:\d+").FullMatch("ruby:1234", &s);

c++ regex search pattern not found

Following the example here I wrote following code:
using namespace std::regex_constants;
std::string str("{trol,asdfsad},{safsa, aaaaa,aaaaadfs}");
std::smatch m;
std::regex r("\\{(.*)\\}"); // matches anything between {}
std::cout << "Initiating search..." << std::endl;
while (std::regex_search(str, m, r)) {
for (auto x : m) {
std::cout << x << " ";
}
std::cout << std::endl;
str = m.suffix().str();
}
But to my surprise, it doesn't find anything at all which I fail to understand. I would understand if the regex matches whole string since .* is greedy but nothing at all? What am I doing wrong here?
To be clear - I know that regexes are not suitable for Parsing BUT I won't deal with more levels of bracket nesting and therefore I find usage of regexes good enough.
If you want to use basic posix syntax, your regex should be
{\\(.*\\)}
If you want to use default ECMAScript, your regex should be
\\{(.*)\\}
with clang and libc++ or with gcc 4.9+ (since only it fully support regex) your code give:
Initiating search...
{trol,asdfsad},{safsa, aaaaa,aaaaadfs} trol,asdfsad},{safsa, aaaaa,aaaaadfs
Live example on coliru
Eventually it turned out to really be problem with gcc version so I finally got it working using boost::regex library and following code:
std::string str("{trol,asdfsad},{safsa,aaaaa,aaaaadfs}");
boost::regex rex("\\{(.*?)\\}", boost::regex_constants::perl);
boost::smatch result;
while (boost::regex_search(str, result, rex)) {
for (uint i = 0; i < result.size(); ++i) {
std::cout << result[i] << " ";
}
std::cout << std::endl;
str = result.suffix().str();
}

C++11 regex doesn't match the string [duplicate]

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 9 years ago.
I want to parse a token that looks like this:
1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=
I use a regular expression ([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}), and it does the job when I try it on refiddle.
Then I try it with C++:
std::regex regexp(R"(([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}))",
std::regex_constants::basic);
std::smatch match;
if (std::regex_search(stringified, match, regexp)) {
cout << match[0] << ',' << match[1] << ',' << match[2] << endl;
} else {
cout << "No matches found" << endl;
}
I compile it on Ubuntu 13.10 x64 using GCC 4.8.1 with -std=c++11 flag. But I always get No matches found. What am I doing wrong?
You were specifying POSIX basic regex, in that format you must escape () and {}
I was able to get get matches with a few changes:
int main(int argc, const char * argv[]){
using std::cout;
using std::endl;
std::regex regexp(R"(\([0-9]\{16\}\):\([0-9]\{5,20\}\):\([a-zA-Z0-9\\+/=]\{28\}\))",std::regex_constants::basic);
std::smatch match;
std::string stringified = "1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=";
if (std::regex_search(stringified, match, regexp)) {
cout << match[1] << "," << match[2] << "," << match[3]<< endl;
} else {
cout << "No matches found" << endl;
}
return 0;
}
Or you could use:
std::regex_constants::extended
If you use std::regex_constants::extended you should not escape () and {}
If you don't want to use a raw string, you can do that as well:
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})",std::regex_constants::extended);
You'll just have to double up on the \\ to properly escape them. The above regex also works with the default regex grammar std::regex_constants::ECMAScript
std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})");
It looks like GCC just added regex supported in their development branch of GCC 4.9.
It appears that you need to use 'extended' syntax. Change regex_constants::basic to regex_constants::extended and it will match.
You need extended syntax in order to perform capturing.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04

Constructing boost regex

I want to match every single number in the following string:
-0.237522264173E+01 0.110011117918E+01 0.563118085683E-01 0.540571836345E-01 -0.237680494785E+01 0.109394729137E+01 -0.237680494785E+01 0.109394729137E+01 0.392277532367E+02 0.478587433035E+02
However, for some reason the following boost::regex doesn't work:
(.*)(-?\\d+\\.\\d+E\\+\\d+ *){10}(.*)
What's wrong with it?
EDIT: posting relevant code:
std::ifstream plik("chains/peak-summary.txt");
std::string mystr((std::istreambuf_iterator<char>(plik)), std::istreambuf_iterator<char>());
plik.close();
boost::cmatch what;
boost::regex expression("(.*)(-?\\d+\\.\\d+E\\+\\d+ *){10}(.*)");
std::cout << "String to match against: \"" << mystr << "\"" << std::endl;
if(regex_match(mystr.c_str(), what, expression))
{
std::cout << "Match!";
std::cout << std::endl << what[0] << std::endl << what[1] << std::endl;
} else {
std::cout << "No match." << std::endl;
}
output:
String to match against: " -0.237555275450E+01 0.109397523269E+01 0.560420828508E-01 0.556732715285E-01 -0.237472295761E+01 0.110192835331E+01 -0.237472295761E+01 0.110192835331E+01 0.393040553508E+02 0.478540190640E+02
"
No match.
Also posting the contents of file read into the string:
[dare2be#schroedinger multinest-peak]$ cat chains/peak-summary.txt
-0.237555275450E+01 0.109397523269E+01 0.560420828508E-01 0.556732715285E-01 -0.237472295761E+01 0.110192835331E+01 -0.237472295761E+01 0.110192835331E+01 0.393040553508E+02 0.478540190640E+02
The (.*) around your regex match and consume all text at the start and end of the string, so if there are more than ten numbers, the first ones won't be matched.
Also, you're not allowing for negative exponents.
(-?\\d\\.\\d+E[+-]\\d+ *){10,}
should work.
This will match all of the numbers in a single string; if you want to match each number separately, you have to use (-?\\d\\.\\d+E[+-]\\d+) iteratively.
Try with:
(-?[0-9]+\\.[0-9]+E[+-][0-9]+)
Your (.*) in the beggining matches greedy whole string.

How can I access all matches of a repeated capture group, not just the last one?

My code is:
#include <boost/regex.hpp>
boost::cmatch matches;
boost::regex_match("alpha beta", matches, boost::regex("([a-z])+"));
cout << "found: " << matches.size() << endl;
And it shows found: 2 which means that only ONE occurrence is found… How to instruct it to find THREE occurrences? Thanks!
You should not call matches.size() before verifying that something was matched, i.e. your code should look rather like this:
#include <boost/regex.hpp>
boost::cmatch matches;
if (boost::regex_match("alpha beta", matches, boost::regex("([a-z])+")))
cout << "found: " << matches.size() << endl;
else
cout << "nothing found" << endl;
The output would be "nothing found" because regex_match tries to match the whole string. What you want is probably regex_search that is looking for substring. The code below could be a bit better for you:
#include <boost/regex.hpp>
boost::cmatch matches;
if (boost::regex_search("alpha beta", matches, boost::regex("([a-z])+")))
cout << "found: " << matches.size() << endl;
else
cout << "nothing found" << endl;
But will output only "2", i.e. matches[0] with "alpha" and matches[1] with "a" (the last letter of alpha - the last group matched)
To get the whole word in the group you have to change the pattern to ([a-z]+) and call the regex_search repeatedly as you did in your own answer.
Sorry to reply 2 years late, but if someone googles here as I did, then maybe it will be still useful for him...
This is what I've found so far:
text = "alpha beta";
string::const_iterator begin = text.begin();
string::const_iterator end = text.end();
boost::match_results<string::const_iterator> what;
while (regex_search(begin, end, what, boost::regex("([a-z]+)"))) {
cout << string(what[1].first, what[2].second-1);
begin = what[0].second;
}
And it works as expected. Maybe someone knows a better solution?
This works for me, maybe somebody will find it usefull..
std::string arg = "alpha beta";
boost::sregex_iterator it{arg.begin(), arg.end(), boost::regex("([a-z])+")};
boost::sregex_iterator end;
for (; it != end; ++it) {
std::cout << *it << std::endl;
}
Prints:
alpha
beta