c++11/regex - search for exact string, escape [duplicate] - c++

This question already has answers here:
std::regex escape special characters for use in regex
(3 answers)
Closed 6 years ago.
Say you have a string which is provided by the user. It can contain any kind of character. Examples are:
std::string s1{"hello world");
std::string s1{".*");
std::string s1{"*{}97(}{.}}\\testing___just a --%#$%# literal%$#%^"};
...
Now I want to search in some text for occurrences of >> followed by the input string s1 followed by <<. For this, I have the following code:
std::string input; // the input text
std::regex regex{">> " + s1 + " <<"};
if (std::regex_match(input, regex)) {
// add logic here
}
This works fine if s1 did not contain any special characters. However, if s1 had some special characters, which are recognized by the regex engine, it doesn't work.
How can I escape s1 such that std::regex considers it as a literal, and therefore does not interpret s1? In other words, the regex should be:
std::regex regex{">> " + ESCAPE(s1) + " <<"};
Is there a function like ESCAPE() in std?
important I simplified my question. In my real case, the regex is much more complex. As I am only having troubles with the fact the s1 is interpreted, I left these details out.

You will have to escape all special characters in the string with \. The most straightforward approach would be to use another expression to sanitize the input string before creating the expression regex.
// matches any characters that need to be escaped in RegEx
std::regex specialChars { R"([-[\]{}()*+?.,\^$|#\s])" };
std::string input = ">> "+ s1 +" <<";
std::string sanitized = std::regex_replace( input, specialChars, R"(\$&)" );
// "sanitized" can now safely be used in another expression

Related

C++ RegEx for this pattern [duplicate]

This question already has answers here:
Regex statement in C++ isn't working as expected [duplicate]
(3 answers)
Closed 3 years ago.
I want to be able to find this pattern inside a c++ string. The pattern is as follows:
FIXED_WORD ANY_WORD(...)
where FIXED_WORD refers to a fixed keyword and ANY_WORD can be any word as long as a bracket follows from it.
I have tried using RegEx such as keyword \b(.*)\b\((.\*)\), where I tried to use the word boundary \b(.*)\b to extract out ANY_WORD followed by a bracket:
std::string s = "abcdefg KEYWORD hello(123456)";
std::smatch match;
std::regex pattern("KEYWORD \b(.*)\b\((.*)\)");
if (std::regex_search(s, match, pattern))
{
std::cout << "Match\n";
for (auto m : match)
std::cout << m << '\n';
}
else {
std::cout << "No match\n";
}
I am always getting a no match for this.
You're forgetting that slashes are escaped when you use a string literal. Use a raw string e.g. R"(...)" to preserve the slashes
std::regex pattern(R"(KEYWORD \b(.*)\b\((.*)\))");
Then your pattern works as expected:
Match
KEYWORD hello(123456)
hello
123456
https://godbolt.org/z/dJaAAX

C++ Qt QString replace double backslash with one

I have a QString with following content:
"MXTP24\\x00\\x00\\xF4\\xF9\\x80\r\n"
I want it to become:
"MXTP24\x00\x00\xF4\xF9\x80\r\n"
I need to replace the "\x" to "\x" so that I can start parsing the values. But the following code, which I think should do the job is not doing anything as I get the same string before and after:
qDebug() << "BEFORE: " << data;
data = data.replace("\\\\x", "\\x", Qt::CaseSensitivity::CaseInsensitive);
qDebug() << "AFTER: " << data;
Here, no change!
Then I tried like this:
data = data.replace("\\x", "\x", Qt::CaseSensitivity::CaseInsensitive);
Then compiler complaines that \x used with no following hex digits!
any ideas?
First let's look at what this piece of code does:
data.replace("\\\\x", "\\x", ....
First string becomes \\x in compiled code, and is used as regular expression. In reqular expression, backslash is special, and needs to be escaped with another backslash to mean actual single backslash character, and your regexp does just this. 4 backslashes in C+n string literal regexp means matching single literal backslash in target text. So your reqular expression matches literal 2-character string \x.
Then you replace it. Replacement isn't a reqular expression, so backslash doesn't need double escaping here, so you end up using literal 2-char replacement string \x, which is same as what you matched, so even if there is a match, nothing changes.
However, this is not your problem, your problem is how qDebug() prints strings. It prints them escaped. That \" at start of output means just plain double quote, 1 char, in the actual string because double quote is escaped. And those \\ also are single backslash char, because literal backslash is also escaped (because it is the escape char and has special meaning for the next char).
So it seems you don't need to do any search replace at all, just remove it.
Try printing the QString in one of these ways to get is shown literally:
std::cout << data << std::endl;
qDebug() << data.toLatin1().constData();

Finding number between [/ and ] using regex in C++

I want to find the number between [/ and ] (12345 in this case).
I have written such code:
float num;
string line = "A111[/12345]";
boost::regex e ("[/([0-9]{5})]");
boost::smatch match;
if (boost::regex_search(line, match, e))
{
std::string s1(match[1].first, match[1].second);
num = boost::lexical_cast<float>(s1); //convert to float
cout << num << endl;
}
However, I get this error: The error occurred while parsing the regular expression fragment: '/([0-9]{5}>>>HERE>>>)]'.
You need to double escape the [ and ] that special characters in regex denoting character classes. The correct regex declaration will be
boost::regex e ("\\[/([0-9]{5})\\]");
This is necessary because C++ compiler also uses a backslash to escape entities like \n, and regex engine uses the backslash to escape special characters so that they are treated like literals. Thus, backslash gets doubled. When you need to match a literal backslash, you will have to use 4 of them (i.e. \\\\).
Use the following (escape [ and ] because they are special characters in regex meaning a character class):
\\[/([0-9]{5})\\]
^^ ^^

C++ Regex getting all match's on line

When reading line by line i call this function on each line looking for function calls(names). I use this function to match the any valid characters a-z 0-9 and _ with '('. My problem is i do not understand fully the c++ style regex and how to get it to look through the entire line for possible matches?. This regex is simple and strait forward just does not work as expected but im learning this is the c++ norm.
void readCallbacks(const std::string lines)
{
std::string regxString = "[a-z0-9]+\(";
regex regx(regxString, std::regex_constants::icase);
smatch result;
if(regex_search(lines.begin(), lines.end(), result, regx, std::regex_constants::match_not_bol))
{
cout << result.str() << "\n";
}
}
You need to escape the backslash or use a raw string literal:
std::regex pattern("[a-z0-9]+\\(", std::regex_constants::icase);
// ^^
std::regex pattern(R"([a-z0-9]+\()", std::regex_constants::icase);
// ###^^^^^^^^^^^##
Also, your character range doesn't contain the desired underscore (_).

C++11 regex to tokenize Mathematical Expression

I have the following code to tokenize a string of the format: (1+2)/((8))-(100*34):
I'd like to throw an error to the user if they use an operator or character that isn't part of my regex.
e.g if user enters 3^4 or x-6
Is there a way to negate my regex, search for it and if it is true throw the error?
Can the regex expression be improved?
//Using c++11 regex to tokenize input string
//[0-9]+ = 1 or many digits
//Or [\\-\\+\\\\\(\\)\\/\\*] = "-" or "+" or "/" or "*" or "(" or ")"
std::regex e ( "[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
std::sregex_iterator rend;
std::sregex_iterator a( infixExpression.begin(), infixExpression.end(), e );
queue<string> infixQueue;
while (a!=rend) {
infixQueue.push(a->str());
++a;
}
return infixQueue;
-Thanks
You can run a search on the string using the search expression [^0-9()+\-*/] defined as C++ string as "[^0-9()+\\-*/]" which finds any character which is NOT a digit, a round bracket, a plus or minus sign (in real hyphen), an asterisk or a slash.
The search with this regular expression search string should not return anything otherwise the string contains a not supported character like ^ or x.
[...] is a positive character class which means find a character being one of the characters in the square brackets.
[^...] is a negative character class which means find a character NOT being one of the characters in the square brackets.
The only characters which must be escaped within square brackets to be interpreted as literal character are ], \ and - whereby - must not be escaped if being first or last character in the list of characters within the square brackets. But it is nevertheless better to escape - always within square brackets as this makes it easier for the regular expression engine / function to detect that the hyphen character should be interpreted as literal character and not with meaning "FROM x to z".
Of course this expression does not check for missing closing round brackets. But formula parsers do often not require that there is always a closing parenthesis for every opening parenthesis in comparison to a compiler or script interpreter simply because not needed to calculate the value based on entered formula.
Answer is given already but perhaps someone might need this
[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]
This regex separates floats, integers and arithmetic operators
Heres the trick:
[0-9]?([0-9]*[.])?[0-9]+ -> if its a digit and has a point, then grab the digits with the point and the digits that follows it, if not, just grab the digits.
Sorry if my answer isn't clear, i just learned regex and found this solution by my own by just trial and errors.
Heres the code (it takes a mathematical expression and split all digits and operators into a vector)
NOTE: I don't know if it accepts whitespaces, meaning that the mathematical expression that i worked with had no whitespaces. Example: 4+2*(3+1) and would separate everything nicely, but i havent tried with whitespaces.
/* Separate every int or float or operator into a single string using regular expression and store it in untokenize vector */
string infix; //The string to be parse (the arithmetic operation if you will)
vector<string> untokenize;
std::regex words_regex("[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
auto words_begin = std::sregex_iterator(infix.begin(), infix.end(), words_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
cout << (*i).str() << endl;
untokenize.push_back((*i).str());
}
Output:
(<br/>
1<br/>
+<br/>
2<br/>
)<br/>
/<br/>
(<br/>
(<br/>
8<br/>
)<br/>
)<br/>
-<br/>
(<br/>
100<br/>
*<br/>
34<br/>
)<br/>