std::regex not working as expected - c++

I googled around but still cannot find the error.
Why does the following code print false, I expected true?
#include <iostream>
#include <regex>
using namespace std;
int main()
{
std::string in("15\n");
std::regex r("[1-9]+[0-9]*\\n",
std::regex_constants::extended);
std::cout << std::boolalpha;
std::cout << std::regex_match(in, r) << std::endl;
}
The option to use regex_search is not given.

There is an extra slash before the "\n" in your regex. The code prints true with just the slash removed.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
std::string in("15\n");
std::regex r("[1-9]+[0-9]*\n",
std::regex_constants::extended);
std::cout << std::boolalpha;
std::cout << std::regex_match(in, r) << std::endl;
}
Edit: #rici explains why this is an issue in a comment:
Posix-standard extended regular expressions (selected with std::regex_constants::extended) do not recognize C-escape sequences such as \n. See Posix base definitions 9.4.2: "The interpretation of an ordinary character preceded by a ( '\' ) is undefined."

Related

Boost regex cpp for finding strings between %% with output excluding the % character itself

I am having a problem with boost regex in cpp. I want to match a string like
"Hello %world% regex %cpp%" and expected string output is world, cpp
Can somebody suggest a regex for this
Thanks
Anil
I personally prefer "\\%([^\\%]*)\\%" (or as a raw string R"r(\%([^\%]*)\%)r")
It doesn't rely on non-greedy qualifiers
Which is essentially
one percent character \\%
any amount of non-percent characters [^\\%]*
one percent character \\%
I know this is tagged boost but here's a solution with std::regex
#include <string>
#include <regex>
#include <iostream>
int main()
{
using namespace std;
string source = "Hello %world%";
regex match_percent_enclosed (R"_(\%([^\%]*)\%)_");
smatch between_percent;
bool found_match = regex_search(source,between_percent,match_percent_enclosed);
if(found_match && between_percent.size()>1)
cout << "found: \"" << between_percent[1].str() << "\"." << endl;
else
cout << "no match found." << endl;
}
you may get some idea
%(.+?)%
Result:
Match 1
1. world
Match 2
1. cpp
You can use this regex \%(.*?)\%smallest group
Online regex: https://regex101.com/r/dSCE2a/2
And for the code with boost
#include <iostream>
#include <cstdlib>
#include <boost/regex.hpp>
using namespace std;
int main()
{
boost::cmatch mat;
boost::regex reg( "\\%(.*?)\\%" );
char szStr[] = "Hello %world% regex %cpp%";
char *where = szStr;
while (regex_search(where, mat, reg))
{
cout << mat[1] << endl; // 0 for whole match, 1 for sub
where = (char*)mat[0].second;
}
}

regex_match fails to find square brackets [duplicate]

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 6 years ago.
I am trying to do regex_match on a string which have square brackets([...]) inside it.
Things I have tried so far:
Normal matching
Backslashing the square brackets with 1 slash
Backslashing the square brackets with 2 slashes
Code to repro:
#include <iostream>
#include <cstring>
#include <regex>
using namespace std;
int main () {
std::string str1 = "a/b/c[2]/d";
std::string str2 = "(.*)a/b/c[2]/d(.*)";
std::regex e(str2);
std::cout << "str1 = " << str1 << std::endl;
std::cout << "str2 = " << str2 << std::endl;
if (regex_match(str1, e)) {
std::cout << "matched" << std::endl;
}
}
This is the error message I get every time I compile it.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)
I was told by stack overflow members that gcc 4.8 or earlier version of it are known to be buggy. So, I needed to update it to latest version.
I have created an Ideone fiddle where compiler should not be issue. Even there, I do not see regex_match happening.
The main problem you have is the outdated gcc compiler: you need to upgrade to some recent version. 4.8.x just does not support regex as it should.
Now, the code you should be using is:
#include <iostream>
#include <cstring>
#include <regex>
using namespace std;
int main () {
std::string str1 = "a/b/c[2]/d";
std::string str2 = R"(a/b/c\[2]/d)";
std::regex e(str2);
std::cout << "str1 = " << str1 << std::endl;
std::cout << "str2 = " << str2 << std::endl;
if (regex_search(str1, e)) {
std::cout << "matched" << std::endl;
}
}
See the IDEONE demo
Use
regex_search instead of regex_match to search for partial matches (regex_match requires a full string match)
The [2] in the regex pattern matches a literal 2 ([...] is a character class matching 1 character from the range/list specified in the character class). To match the literal square brackets, you need to escape the [ and you do not have to escape ]: R"(a/b/c\[2]/d)".
Well they should definitely be escaped by using a backslash. Unfortunately since backslash is itself special in a literal string you need two backslashes. So the regex should look like "(.*)a/b/c\\[2\\]/d(.*)".
Raw string literals often simplify cases where one would otherwise have to have complex escape sequences:
#include <iostream>
#include <cstring>
#include <regex>
using namespace std;
int main () {
std::string str1 = "a/b/c[2]/d";
std::string str2 = R"regex((.*)a/b/c[2]/d(.*))regex";
std::regex e(str2);
std::cout << "str1 = " << str1 << std::endl;
std::cout << "str2 = " << str2 << std::endl;
if (regex_match(str1, e)) {
std::cout << "matched" << std::endl;
}
}
expected output:
str1 = a/b/c[2]/d
str2 = (.*)a/b/c[2]/d(.*)

C++ regex_match doesn't match

I'm using C++ on XCode. I'd like to match non-alphabet characters using regex_match but seem to be having difficulty:
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, const char * argv[])
{
cout << "BY-WORD: " << regex_match("BY-WORD", regex("[^a-zA-Z]")) << endl;
cout << "BYEWORD: " << regex_match("BYEWORD", regex("[^a-zA-Z]")) << endl;
return 0;
}
which returns:
BY-WORD: 0
BYEWORD: 0
I want "BY-WORD" to be matched (because of the hyphen), but regex_match returns a 0 for both tests.
I confoosed.
regex_match tries to match the whole input string against the regular expression you provide. Since your expression would only match a single character, it will always come back false on those inputs.
You probably want regex_search instead.
regex_match() returns whether the target sequence matches the regular expression rgx. If you want to search the non-alphabet characters from the target sequence, you need regex_search():
#include <regex>
#include <iostream>
int main()
{
std::regex rx("[^a-zA-Z]");
std::smatch res;
std::string str("BY-WORD");
while (std::regex_search (str,res,rx)) {
std::cout <<res[0] << std::endl;
str = res.suffix().str();
}
}

Regex search & replace group in C++?

The best I can come up with is:
#include <boost/algorithm/string/replace.hpp>
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
int main() {
string dog = "scooby-doo";
boost::regex pattern("(\\w+)-doo");
boost::smatch groups;
if (boost::regex_match(dog, groups, pattern))
boost::replace_all(dog, string(groups[1]), "scrappy");
cout << dog << endl;
}
with output:
scrappy-doo
.. is there a simpler way of doing this, that doesn't involve doing two distinct searches? Maybe with the new C++11 stuff (although I'm not sure that it's compatible with gcc atm?)
std::regex_replace should do the trick. The provided example is pretty close to your problem, even to the point of showing how to shove the answer straight into cout if you want. Pasted here for posterity:
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
int main()
{
std::string text = "Quick brown fox";
std::regex vowel_re("a|e|i|o|u");
// write the results to an output iterator
std::regex_replace(std::ostreambuf_iterator<char>(std::cout),
text.begin(), text.end(), vowel_re, "*");
// construct a string holding the results
std::cout << '\n' << std::regex_replace(text, vowel_re, "[$&]") << '\n';
}

If-Then-Else Conditionals in Regular Expressions and using capturing group

I have some difficulties in understanding if-then-else conditionals in regular expressions.
After reading If-Then-Else Conditionals in Regular Expressions I decided to write a simple test. I use C++, Boost 1.38 Regex and MS VC 8.0.
I have written this program:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
std::string str_to_modify = "123";
//std::string str_to_modify = "ttt";
boost::regex regex_to_search ("(\\d\\d\\d)");
std::string regex_format ("(?($1)$1|000)");
std::string modified_str =
boost::regex_replace(
str_to_modify,
regex_to_search,
regex_format,
boost::match_default | boost::format_all | format_no_copy );
std::cout << modified_str << std::endl;
return 0;
}
I expected to get "123" if str_to_modify has "123" and to get "000" if I str_to_modify has "ttt". However I get ?123123|000 in the first case and nothing in second one.
Coluld you tell me, please, what is wrong with my test?
The second example that still doesn't work :
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
//std::string str_to_modify = "123";
std::string str_to_modify = "ttt";
boost::regex regex_to_search ("(\\d\\d\\d)");
std::string regex_format ("(?1foo:bar");
std::string modified_str =
boost::regex_replace(str_to_modify, regex_to_search, regex_format,
boost::match_default | boost::format_all | boost::format_no_copy );
std::cout << modified_str << std::endl;
return 0;
}
I think the format string should be (?1$1:000) as described in the Boost.Regex docs.
Edit: I don't think regex_replace can do what you want. Why don't you try the following instead? regex_match will tell you whether the match succeeded (or you can use match[i].matched to check whether the i-th tagged sub-expression matched). You can format the match using the match.format member function.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
boost::regex regex_to_search ("(\\d\\d\\d)");
std::string str_to_modify;
while (std::getline(std::cin, str_to_modify))
{
boost::smatch match;
if (boost::regex_match(str_to_modify, match, regex_to_search))
std::cout << match.format("foo:$1") << std::endl;
else
std::cout << "error" << std::endl;
}
}