regex_match fails to find square brackets [duplicate] - c++

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 6 years ago.
I am trying to do regex_match on a string which have square brackets([...]) inside it.
Things I have tried so far:
Normal matching
Backslashing the square brackets with 1 slash
Backslashing the square brackets with 2 slashes
Code to repro:
#include <iostream>
#include <cstring>
#include <regex>
using namespace std;
int main () {
std::string str1 = "a/b/c[2]/d";
std::string str2 = "(.*)a/b/c[2]/d(.*)";
std::regex e(str2);
std::cout << "str1 = " << str1 << std::endl;
std::cout << "str2 = " << str2 << std::endl;
if (regex_match(str1, e)) {
std::cout << "matched" << std::endl;
}
}
This is the error message I get every time I compile it.
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)
I was told by stack overflow members that gcc 4.8 or earlier version of it are known to be buggy. So, I needed to update it to latest version.
I have created an Ideone fiddle where compiler should not be issue. Even there, I do not see regex_match happening.

The main problem you have is the outdated gcc compiler: you need to upgrade to some recent version. 4.8.x just does not support regex as it should.
Now, the code you should be using is:
#include <iostream>
#include <cstring>
#include <regex>
using namespace std;
int main () {
std::string str1 = "a/b/c[2]/d";
std::string str2 = R"(a/b/c\[2]/d)";
std::regex e(str2);
std::cout << "str1 = " << str1 << std::endl;
std::cout << "str2 = " << str2 << std::endl;
if (regex_search(str1, e)) {
std::cout << "matched" << std::endl;
}
}
See the IDEONE demo
Use
regex_search instead of regex_match to search for partial matches (regex_match requires a full string match)
The [2] in the regex pattern matches a literal 2 ([...] is a character class matching 1 character from the range/list specified in the character class). To match the literal square brackets, you need to escape the [ and you do not have to escape ]: R"(a/b/c\[2]/d)".

Well they should definitely be escaped by using a backslash. Unfortunately since backslash is itself special in a literal string you need two backslashes. So the regex should look like "(.*)a/b/c\\[2\\]/d(.*)".

Raw string literals often simplify cases where one would otherwise have to have complex escape sequences:
#include <iostream>
#include <cstring>
#include <regex>
using namespace std;
int main () {
std::string str1 = "a/b/c[2]/d";
std::string str2 = R"regex((.*)a/b/c[2]/d(.*))regex";
std::regex e(str2);
std::cout << "str1 = " << str1 << std::endl;
std::cout << "str2 = " << str2 << std::endl;
if (regex_match(str1, e)) {
std::cout << "matched" << std::endl;
}
}
expected output:
str1 = a/b/c[2]/d
str2 = (.*)a/b/c[2]/d(.*)

Related

Regex match the entire string [duplicate]

I'm reading a text file in the form of
People list
[Jane]
Female
31
...
and for each line I want to loop through and find the line that contains "[...]"
For example, [Jane]
I came up with the regex expression
"(^[\w+]$)"
which I tested that it works using regex101.com.
However, when I try to use that in my code, it fails to match with anything.
Here's my code:
void Jane::JaneProfile() {
// read each line, for each [title], add the next lines into its array
std::smatch matches;
for(int i = 0; i < m_numberOfLines; i++) { // #lines in text file
std::regex pat ("(^\[\w+\]$)");
if(regex_search(m_lines.at(i), matches, pat)) {
std::cout << "smatch " << matches.str(0) << std::endl;
std::cout << "smatch.size() = " << matches.size() << std::endl;
} else
std::cout << "wth" << std::endl;
}
}
When I run this code, all the lines go to the else loop and nothing matches...
I searched up for answers, but I got confused when I saw that for C++ you have to use double backslashes instead one backslash to escape... But it didn't work for my code even when I used double backslashes...
Where did I go wrong?
By the way, I'm using Qt Creator 3.6.0 Based on (Desktop) Qt 5.5.1 (Clang 6.1 (Apple), 64 bit)
---Edit----
I tried doing:
std::regex pat (R"(^\[\\w+\]$)");
But I get an error saying
Use of undeclared identifier 'R'
I already have #include <regex> but do I need to include something else?
Either escape the backslashes or use the raw character version with a prefix that won't appear in the regex:
escaped:
std::regex pat("^\\[\\w+\\]$");
raw character string:
std::regex pat(R"regex(^\[\w+\]$)regex");
working demo (adapted from OPs posted code):
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
int main()
{
auto test_data =
"People list\n"
"[Jane]\n"
"Female\n"
"31";
// initialise test data
std::istringstream source(test_data);
std::string buffer;
std::vector<std::string> lines;
while (std::getline(source, buffer)) {
lines.push_back(std::move(buffer));
}
// test the regex
// read each line, for each [title], add the next lines into its array
std::smatch matches;
for(int i = 0; i < lines.size(); ++i) { // #lines in text file
static const std::regex pat ("(^\\[\\w+\\]$)");
if(regex_search(lines.at(i), matches, pat)) {
std::cout << "smatch " << matches.str() << std::endl;
std::cout << "smatch.size() = " << matches.size() << std::endl;
} else
std::cout << "wth" << std::endl;
}
return 0;
}
expected output:
wth
smatch [Jane]
smatch.size() = 2
wth
wth

Boost regex cpp for finding strings between %% with output excluding the % character itself

I am having a problem with boost regex in cpp. I want to match a string like
"Hello %world% regex %cpp%" and expected string output is world, cpp
Can somebody suggest a regex for this
Thanks
Anil
I personally prefer "\\%([^\\%]*)\\%" (or as a raw string R"r(\%([^\%]*)\%)r")
It doesn't rely on non-greedy qualifiers
Which is essentially
one percent character \\%
any amount of non-percent characters [^\\%]*
one percent character \\%
I know this is tagged boost but here's a solution with std::regex
#include <string>
#include <regex>
#include <iostream>
int main()
{
using namespace std;
string source = "Hello %world%";
regex match_percent_enclosed (R"_(\%([^\%]*)\%)_");
smatch between_percent;
bool found_match = regex_search(source,between_percent,match_percent_enclosed);
if(found_match && between_percent.size()>1)
cout << "found: \"" << between_percent[1].str() << "\"." << endl;
else
cout << "no match found." << endl;
}
you may get some idea
%(.+?)%
Result:
Match 1
1. world
Match 2
1. cpp
You can use this regex \%(.*?)\%smallest group
Online regex: https://regex101.com/r/dSCE2a/2
And for the code with boost
#include <iostream>
#include <cstdlib>
#include <boost/regex.hpp>
using namespace std;
int main()
{
boost::cmatch mat;
boost::regex reg( "\\%(.*?)\\%" );
char szStr[] = "Hello %world% regex %cpp%";
char *where = szStr;
while (regex_search(where, mat, reg))
{
cout << mat[1] << endl; // 0 for whole match, 1 for sub
where = (char*)mat[0].second;
}
}

C++ regex library

I have this sample code
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("\\(?<=pi\\)\\(\\d+\\)\\(?=\"\\)");
std::regex e (reg);
std::cout << "Target sequence: " << s << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
I need to get number between pi and " -> (piMYNUMBER")
In online regex service my regex works fine (?<=pi)(\d+)(?=") but c++ regex don't match anything.
Who knows what is wrong with my expression?
Best regards
That is correct, C++ std::regex flavors do not support lookbehinds. You need to capture the digits between pi and ":
#include <iostream>
#include <vector>
#include <regex>
int main() {
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("pi(\\d+)\""); // Or, with a raw string literal:
// std::string reg(R"(pi(\d+)\")");
std::regex e (reg);
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), e, 1),
std::sregex_token_iterator());
// Demo printing the results:
std::cout << "Number of matches: " << results.size() << std::endl;
for( auto & p : results ) std::cout << p << std::endl;
return 0;
}
See the C++ demo. Output:
Number of matches: 1
656
Here, pi(\d+)" pattern matches
pi - a literal substring
(\d+) - captures 1+ digits into Group 1
" - consumes a double quote.
Note the fourth argument to std::sregex_token_iterator, it is 1 because you need to collect only Group 1 values.

std::regex not working as expected

I googled around but still cannot find the error.
Why does the following code print false, I expected true?
#include <iostream>
#include <regex>
using namespace std;
int main()
{
std::string in("15\n");
std::regex r("[1-9]+[0-9]*\\n",
std::regex_constants::extended);
std::cout << std::boolalpha;
std::cout << std::regex_match(in, r) << std::endl;
}
The option to use regex_search is not given.
There is an extra slash before the "\n" in your regex. The code prints true with just the slash removed.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
std::string in("15\n");
std::regex r("[1-9]+[0-9]*\n",
std::regex_constants::extended);
std::cout << std::boolalpha;
std::cout << std::regex_match(in, r) << std::endl;
}
Edit: #rici explains why this is an issue in a comment:
Posix-standard extended regular expressions (selected with std::regex_constants::extended) do not recognize C-escape sequences such as \n. See Posix base definitions 9.4.2: "The interpretation of an ordinary character preceded by a ( '\' ) is undefined."

C++ regex_match doesn't match

I'm using C++ on XCode. I'd like to match non-alphabet characters using regex_match but seem to be having difficulty:
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, const char * argv[])
{
cout << "BY-WORD: " << regex_match("BY-WORD", regex("[^a-zA-Z]")) << endl;
cout << "BYEWORD: " << regex_match("BYEWORD", regex("[^a-zA-Z]")) << endl;
return 0;
}
which returns:
BY-WORD: 0
BYEWORD: 0
I want "BY-WORD" to be matched (because of the hyphen), but regex_match returns a 0 for both tests.
I confoosed.
regex_match tries to match the whole input string against the regular expression you provide. Since your expression would only match a single character, it will always come back false on those inputs.
You probably want regex_search instead.
regex_match() returns whether the target sequence matches the regular expression rgx. If you want to search the non-alphabet characters from the target sequence, you need regex_search():
#include <regex>
#include <iostream>
int main()
{
std::regex rx("[^a-zA-Z]");
std::smatch res;
std::string str("BY-WORD");
while (std::regex_search (str,res,rx)) {
std::cout <<res[0] << std::endl;
str = res.suffix().str();
}
}