C++ Qt QString replace double backslash with one - c++

I have a QString with following content:
"MXTP24\\x00\\x00\\xF4\\xF9\\x80\r\n"
I want it to become:
"MXTP24\x00\x00\xF4\xF9\x80\r\n"
I need to replace the "\x" to "\x" so that I can start parsing the values. But the following code, which I think should do the job is not doing anything as I get the same string before and after:
qDebug() << "BEFORE: " << data;
data = data.replace("\\\\x", "\\x", Qt::CaseSensitivity::CaseInsensitive);
qDebug() << "AFTER: " << data;
Here, no change!
Then I tried like this:
data = data.replace("\\x", "\x", Qt::CaseSensitivity::CaseInsensitive);
Then compiler complaines that \x used with no following hex digits!
any ideas?

First let's look at what this piece of code does:
data.replace("\\\\x", "\\x", ....
First string becomes \\x in compiled code, and is used as regular expression. In reqular expression, backslash is special, and needs to be escaped with another backslash to mean actual single backslash character, and your regexp does just this. 4 backslashes in C+n string literal regexp means matching single literal backslash in target text. So your reqular expression matches literal 2-character string \x.
Then you replace it. Replacement isn't a reqular expression, so backslash doesn't need double escaping here, so you end up using literal 2-char replacement string \x, which is same as what you matched, so even if there is a match, nothing changes.
However, this is not your problem, your problem is how qDebug() prints strings. It prints them escaped. That \" at start of output means just plain double quote, 1 char, in the actual string because double quote is escaped. And those \\ also are single backslash char, because literal backslash is also escaped (because it is the escape char and has special meaning for the next char).
So it seems you don't need to do any search replace at all, just remove it.
Try printing the QString in one of these ways to get is shown literally:
std::cout << data << std::endl;
qDebug() << data.toLatin1().constData();

Related

regex_replace is returning empty string

I am trying to remove all characters that are not digit, dot (.), plus/minus sign (+/-) with empty character/string for float conversion.
When I pass my string through regex_replace function I am returned an empty string.
I belive something is wrong with my regex expression std::regex reg_exp("\\D|[^+-.]")
Code
#include <iostream>
#include <regex>
int main()
{
std::string temporary_recieve_data = " S S +456.789 tg\r\n";
std::string::size_type sz;
const std::regex reg_exp("\\D|[^+-.]"); // matches not digit, decimal point (.), plus sign, minus sign
std::string numeric_string = std::regex_replace(temporary_recieve_data, reg_exp, ""); //replace the character that are not digit, dot (.), plus-minus sign (+,-) with empty character/string for float conversion
std::cout << "Numeric String : " << numeric_string << std::endl;
if (numeric_string.empty())
{
return 0;
}
float data_value = std::stof(numeric_string, &sz);
std::cout << "Float Value : " << data_value << std::endl;
return 0;
}
I have been trying to evaluate my regex expression on regex101.com for past 2 days but I am unable to figure out where I am wrong with my regular expression. When I just put \D, the editor substitutes non-digit character properly but soon as I add or condition | for not dot . or plus + or minus - sign the editor returns empty string.
The string is empty because your regex matches each character.
\D already matches every character that is not a digit.
So plus, hyphen and the period thus far are consumed.
And digits get consumed by the negated class: [^+-.]
Further the hyphen indicates a range inside a character class.
Either escape it or put it at the start or end of the char-class.
(funnily the used range +-. 43-46 even contained a hyphen)
Remove the alternation with \D and put \d into the negated class:
[^\d.+-]+
See this demo at regex101 (attaching + for one or more is efficient)

Str.global_replace in OCaml putting carats where they shouldn't be

I am working to convert multiline strings into a list of tokens that might be easier for me to work with.
In accordance with the specific needs of my project, I'm padding any carat symbol that appears in my input with spaces, so that "^" gets turned into " ^ ". I'm using something like the following function to do so:
let bad_function string = Str.global_replace (Str.regexp "^") " ^ " (string)
I then use something like the below function to then turn this multiline string into a list of tokens (ignoring whitespace).
let string_to_tokens string = (Str.split (Str.regexp "[ \n\r\x0c\t]+") (string));;
For some reason, bad_function adds carats to places where they shouldn't be. Take the following line of code:
(bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
The first line of the string turns into:
^ This is some \n ^
When I feed the output from bad_function into string_to_tokens I get the following list:
string_to_tokens (bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
["^"; "This"; "is"; "some"; "^"; "multiline"; "input"; "^"; "with";
"newline"; "characters"; "^"; "and"; "tabs."; "When"; "I"; "convert";
"this"; "string"; "^"; "into"; "a"; "list"; "of"; "tokens"; "I"; "get";
"^s"; "showing"; "up"; "where"; "^"; "they"; "shouldn't."]
Why is this happening, and how can I fix so these functions behave like I want them to?
As explained in the Str module.
^ Matches at beginning of line: either at the beginning of the
matched string, or just after a '\n' character.
So you have to quote the '^' character using the escape character "\".
However, note that (also from the doc)
any backslash character in the regular expression must be doubled to
make it past the OCaml string parser.
This means you have to put a double '\' to do what you want without getting a warning.
This should do the job:
let bad_function string = Str.global_replace (Str.regexp "\\^") " ^ " (string);;

Using one cout command to print multiple strings with each string placed on a different (text editor) line

Take a look at the following example:
cout << "option 1:
\n option 2:
\n option 3";
I know,it's not the best way to output a string,but the question is why does this cause an error saying that a " character is missing?There is a single string that must go to stdout but it just consists of a lot of whitespace charcters.
What about this:
string x="
string_test";
One may interpret that string as: "\nxxxxxxxxxxxxstring_test" where x is a whitespace character.
Is it a convention?
That's called multiline string literal.
You need to escape the embedded newline. Otherwise, it will not compile:
std::cout << "Hello world \
and stackoverflow";
Note: Backslashes must be immediately before the line ends as they need to escape the newline in the source.
Also you can use the fun fact "Adjacent string literals are concatenated by the compiler" for your advantage by this:
std::cout << "Hello World"
"Stack overflow";
See this for raw string literals. In C++11, we have raw string literals. They are kind of like here-text.
Syntax:
prefix(optional) R"delimiter( raw_characters )delimiter"
It allows any character sequence, except that it must not contain the
closing sequence )delimiter". It is used to avoid escaping of any
character. Anything between the delimiters becomes part of the string.
const char* s1 = R"foo(
Hello
World
)foo";
Example taken from cppreference.

Remove every occurence of special characters in QString

How can I remove every occurence of special characters ^ and $ in a QString?
I tried:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[^$]."));
You missed to escape the ^. To escape that, a \ is needed, but that also needs to be escaped because of C strings. Also you want one ore more occurences to match with +.
This regular expression should work: [\\^$]+, see online.
So it has to be:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[\\^$]+"));
Another possibility as said in the comments below by Joe P is:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[$^]+"));
because the ^ has just a special meaning at the beginning, where you have to escape it to get it literally, see online.
You can also try using a regular expression where you can remove every non-alphanumeric character:
QString str = "$om<Mof*%njas"
str = str.remove(QRegExp("[^a-zA-Z\\d\\s]"));

QRegExp not finding expected string pattern

I am working in Qt 5.2, and I have a piece of code that takes in a string and enters one of several if statements based on its format. One of the formats searched for is the letters "RCV", followed by a variable amount of numbers, a decimal, and then one more number. There can be more than one of these values in the line, separated by "|", for example it could one value like "RCV0123456.1" or mulitple values like "RCV12345.1|RCV678.9". Right now I am using QRegExp class to find this, like this:
QString value = "RCV000030249.2|RCV000035360.2"; //Note: real test value from my code
if(QRegExp("^[RCV\d+\.\d\|?]+$").exactMatch(value))
std::cout << ":D" << std::endl;
else
std::cout << ":(" << std::endl;
I want it to use the if statement, but it keeps going into the else statement. Is there something I'm doing wrong with the regular expression?
Your expression should be like #vahancho mentionet in a comment:
if(QRegExp("^[RCV\\d+\\.\\d\\|?]+$").exactMatch(value))
If you use C++11, then you can use its raw strings feature:
if(QRegExp(R"(^[RCV\d+\.\d\|?]+$)").exactMatch(value))
Aside from escaping the backslashes which others has mentioned in answers and comments,
There can be more than one of these values in the line, separated by "|", for example it could one value like "RCV0123456.1" or mulitple values like "RCV12345.1|RCV678.9".
[RCV\d+\.\d\|?] may not be doing what you expect. Perhaps you want () instead of []:
/^
[RCV\d+\.\d\|?]+ # More than one of characters from the list:
# R, C, V, a digit, a +, a dot, a digit, a |, a ?
$/x
/^
(
RCV\d+\.\d # RCV, some digits, a dot, followed by a digit
\|? # Optional: a |
)+ # Quantifier of one or more
$/x
Also, maybe you could revise the regex such that the optional | requires the group to be matched *again*:
/^
(RCV\d+\.\d) # RCV, some digits, a dot, followed by a digit
(
\|(?1) # A |, then match subpattern 1 (Above)
)+ # Quantifier of one or more
$/x
Check if only valid occurences in line with the addition to require an | starting second occurence (having your implementation would not require the | even with double quotes):
QString value = "RCV000030249.2|RCV000035360.2"; //Note: real test value from my code
if(QRegExp("^RCV\\d+\\.\\d(\\|RCV\\d+\\.\\d)*$").exactMatch(value))
std::cout << ":D" << std::endl;
else
std::cout << ":(" << std::endl;