C++ RegEx for this pattern [duplicate] - c++

This question already has answers here:
Regex statement in C++ isn't working as expected [duplicate]
(3 answers)
Closed 3 years ago.
I want to be able to find this pattern inside a c++ string. The pattern is as follows:
FIXED_WORD ANY_WORD(...)
where FIXED_WORD refers to a fixed keyword and ANY_WORD can be any word as long as a bracket follows from it.
I have tried using RegEx such as keyword \b(.*)\b\((.\*)\), where I tried to use the word boundary \b(.*)\b to extract out ANY_WORD followed by a bracket:
std::string s = "abcdefg KEYWORD hello(123456)";
std::smatch match;
std::regex pattern("KEYWORD \b(.*)\b\((.*)\)");
if (std::regex_search(s, match, pattern))
{
std::cout << "Match\n";
for (auto m : match)
std::cout << m << '\n';
}
else {
std::cout << "No match\n";
}
I am always getting a no match for this.

You're forgetting that slashes are escaped when you use a string literal. Use a raw string e.g. R"(...)" to preserve the slashes
std::regex pattern(R"(KEYWORD \b(.*)\b\((.*)\))");
Then your pattern works as expected:
Match
KEYWORD hello(123456)
hello
123456
https://godbolt.org/z/dJaAAX

Related

How to use C++ regex to find characters between square brackets [duplicate]

This question already has an answer here:
This regex doesn't work in c++
(1 answer)
Closed 2 years ago.
I have a C++ regex to search for characters inside square brackets, for instance if the string is x[7:0], I want to return 7:0. This is what my regex looks like -
std::regex reg("\[(.*?)\]");
When I compile (g++) I get the following warning -
../fixedPointFormatter.cc:30:18: warning: unknown escape sequence: ']'
std::regex reg("[(.*?)]");
^~~~~~~~~~~
The following returns nothing...
if (regex_search(arg, matches, reg)) {
for (int i=1; i<matches.size(); i++) {
cout << matches[i] << endl;
}
}
Can someone help identify what is wrong with this?
The regex needs the [ and the ] to be escaped using \. However, to provide that to the regex, you need to escape the \s in the string.
std::regex reg("\\[(.*?)\\]");

Regular expression to match input of n words separated by m spaces [duplicate]

This question already has answers here:
Regular expression capturing a repeated group
(1 answer)
c++ std::regex, smatch retains subexpressions only once for their apperance in a pattern string
(1 answer)
Closed 6 years ago.
So I'm learning regular expressions in c++11 and i'm trying to create a regular expression to match an input of N words separeted by M spaces.
So, for example, you input " word word word word ..." and you can continue like this for how long you like.
Now my problems come when I try to access the fields in the smatch variable after comparing an input to the regular expression. At the moment what I have is:
#include <regex>
regex input_reg(
"(?:[[:space:]]*"
"([[:alpha:]_]+)"
"[[:space:]]*)+");
smatch comparison;
if (regex_match(input, comparison, input_reg)){
for (smatch::size_type i = 0; i < comparison.size(); ++i){
cout << i << ": '" << comparison.str(i) << "'" << endl;
}
}
The problem with this is that for some reason, I get a match as I should but when I try to cout all the fields to see if it works I only get the initial match and the first field, nothing else:
0: ' word word word word '
1: 'word'
What am I doing wrong?
EDIT: The input is as seen in cout example of my code, it doesn't show all the spaces in the text for some reason.

c++11/regex - search for exact string, escape [duplicate]

This question already has answers here:
std::regex escape special characters for use in regex
(3 answers)
Closed 6 years ago.
Say you have a string which is provided by the user. It can contain any kind of character. Examples are:
std::string s1{"hello world");
std::string s1{".*");
std::string s1{"*{}97(}{.}}\\testing___just a --%#$%# literal%$#%^"};
...
Now I want to search in some text for occurrences of >> followed by the input string s1 followed by <<. For this, I have the following code:
std::string input; // the input text
std::regex regex{">> " + s1 + " <<"};
if (std::regex_match(input, regex)) {
// add logic here
}
This works fine if s1 did not contain any special characters. However, if s1 had some special characters, which are recognized by the regex engine, it doesn't work.
How can I escape s1 such that std::regex considers it as a literal, and therefore does not interpret s1? In other words, the regex should be:
std::regex regex{">> " + ESCAPE(s1) + " <<"};
Is there a function like ESCAPE() in std?
important I simplified my question. In my real case, the regex is much more complex. As I am only having troubles with the fact the s1 is interpreted, I left these details out.
You will have to escape all special characters in the string with \. The most straightforward approach would be to use another expression to sanitize the input string before creating the expression regex.
// matches any characters that need to be escaped in RegEx
std::regex specialChars { R"([-[\]{}()*+?.,\^$|#\s])" };
std::string input = ">> "+ s1 +" <<";
std::string sanitized = std::regex_replace( input, specialChars, R"(\$&)" );
// "sanitized" can now safely be used in another expression

How to match one of multiple alternative patterns with C++11 regex [duplicate]

This question already has an answer here:
Strange results when using C++11 regexp with gcc 4.8.2 (but works with Boost regexp) [duplicate]
(1 answer)
Closed 7 years ago.
With Perl, the following results in a match:
echo xyz | perl -ne 'print if (/.*(yes|no|xy).*/);'
I'm trying to achieve the same thing with a C++ regex. The ECMAScript syntax documentation says
A regular expression can contain multiple alternative patterns simply
by separating them with the separator operator (|): The regular
expression will match if any of the alternatives match, and as soon as
one does.
However, the following example seems to suggest that std::regex_match only matches the first two alternatives, ignoring the third:
std::string pattern1 = ".*(yes|no|xy).*";
std::string pattern2 = ".*(yes|xy|no).*";
std::regex re1(pattern1);
std::regex re2(pattern2);
for (std::string str : {"yesplease", "bayes", "nobody", "anode", "xyz", "abc"} ) {
if (std::regex_match(str,re1)) {
std::cout << str << "\t\tmatches " << pattern1 << "\n";
}
else if (std::regex_match(str,re2)) {
std::cout << str << "\t\tmatches " << pattern2 << "\n";
}
}
Output:
yesplease matches .*(yes|no|xy).*
bayes matches .*(yes|no|xy).*
nobody matches .*(yes|no|xy).*
anode matches .*(yes|no|xy).*
xyz matches .*(yes|xy|no).*
How can I obtain the same behaviour as with my Perl regex example, i.e. having 'xyz' match pattern1?
It looks like regex is not fully implemented in gcc version 4.8.2 but rather in later versions of gcc (i.e., version > 4.9.0).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53631
In gcc version 4.9.0 works ok LIVE DEMO
So I guess you'll have to upgrade to newer version of gcc.

Regex in C++11 vs PHP

I'm new to regex and C++11. In order to match an expression like this :
TYPE SIZE NUMBER ("regina s x99");
I built a regex which looks like this one :
\b(regina|margarita|americaine|fantasia)\b \b(s|l|m|xl|xxl)\b x([1-9])([0-9])
In my code I did this to try the regex :
std::string s("regina s x99");
std::regex rgx($RGX); //$RGX corresponds to the regex above
if (std::regex_match(s, rgx))
std::cout << "It works !" << std::endl;
This code throw a std::regex_error, but I don't know where it comes from..
Thanks,
This works with g++ (4.9.2) in c++11 mode:
std::regex rgx("\\b(regina|margarita|americaine|fantasia)\\b\\s*(s|l|m|xl|xxl)\\b\\s*x([1-9]*[0-9])");
This will capture three groups: regina s 99 which matches the TYPE SIZE NUMBER pattern, while your original captured four groups regina s 9 9 and had the NUMBER as two values (maybe that was what you wanted though).
Demo on IdeOne
In C++ strings the \ character is special and needs to be escaped so that it gets passed to the regular expression engine, not interpreted by the compiler.
So you either need to use \\b:
std::regex rgx("\\b(regina|margarita|americaine|fantasia)\\b \\b(s|l|m|xl|xxl)\\b x([1-9])([0-9])");
or use a raw string, which means that \ is not special and doesn't need to be escaped:
std::regex rgx(R"(\b(regina|margarita|americaine|fantasia)\b \b(s|l|m|xl|xxl)\b x([1-9])([0-9]))");
There was a typo in this line in original question:
if (std::reegex_match(s, rgx))
More over I am not sure what are you passing with this variable : $RGX
Corrected program as follows:
#include<regex>
#include<iostream>
int main()
{
std::string s("regina s x99");
std::regex rgx("\\b(regina|margarita|americaine|fantasia)\\b \\s*(s|l|m|xl|xxl)\\b \\s*x([1-9])([0-9])"); //$RGX corresponds to the regex above
if (std::regex_match(s, rgx))
std::cout << "It works !" << std::endl;
else
std::cout<<"No Match"<<std::endl;
}