Remove every occurence of special characters in QString - c++

How can I remove every occurence of special characters ^ and $ in a QString?
I tried:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[^$]."));

You missed to escape the ^. To escape that, a \ is needed, but that also needs to be escaped because of C strings. Also you want one ore more occurences to match with +.
This regular expression should work: [\\^$]+, see online.
So it has to be:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[\\^$]+"));
Another possibility as said in the comments below by Joe P is:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[$^]+"));
because the ^ has just a special meaning at the beginning, where you have to escape it to get it literally, see online.

You can also try using a regular expression where you can remove every non-alphanumeric character:
QString str = "$om<Mof*%njas"
str = str.remove(QRegExp("[^a-zA-Z\\d\\s]"));

Related

Regex replace names of methods

I'm trying to replace all occurrences of names within a given string. I'm using regex, since a simple substring match won't work in this case and I need to match full words.
My problem is that I can only match words before and after blanks. But for example I cannot replace a string when it's followed by a blank, like:
toReplace()
with:
theReplacement()
My regex replace method looks like this:
void replaceWord(std::string &str, const std::string& search, const std::string& replace)
{
// Regular expression to match words beginning with 'search'
// std::regex e ("(\\b("+search+"))([^,. ]*)");
// std::regex e ("(\\b("+search+"))\\b)");
std::regex e("(\\b("+search+"))([^,.()<>{} ]*)");
str = std::regex_replace(str,e,replace) ;
}
How should the regex look like in order to ignore leading and trailing non-alphanumericals?
You need to
Escape all special characters in the regex pattern with std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"))
Escape all special chars in the replacement pattern with std::regex_replace(replace, std::regex("[$]"), std::string("$$$$")) (that is in case you replace with literal $1 text, $ can be set with $$, so to replace with a double $, we need $$$$ in the replacement here)
Wrap your search pattern with unambiguous word boundaries, i.e. "(\\W|^)("+search+")(?!\\w)
When you replace, add $1 at the start of the replacement pattern to keep the whitespace (if it is matched and captured into the first group with the (\W|^) pattern).
See C++ sample code:
std::string replaceWord(std::string &str, std::string& search, std::string& replace)
{
// Escape the literal regex pattern
search = std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"));
// Escape the literal replacement pattern
replace = std::regex_replace(replace, std::regex("[$]"), std::string("$$$$"));
std::regex e("(\\W|^)("+search+")(?!\\w)");
return std::regex_replace(str, e, std::string("$1") + replace);
}
Then,
std::string text("String toReplace()");
std::string s("toReplace()");
std::string r("theReplacement()");
std::cout << replaceWord(text, s, r);
// => String theReplacement()

Str.global_replace in OCaml putting carats where they shouldn't be

I am working to convert multiline strings into a list of tokens that might be easier for me to work with.
In accordance with the specific needs of my project, I'm padding any carat symbol that appears in my input with spaces, so that "^" gets turned into " ^ ". I'm using something like the following function to do so:
let bad_function string = Str.global_replace (Str.regexp "^") " ^ " (string)
I then use something like the below function to then turn this multiline string into a list of tokens (ignoring whitespace).
let string_to_tokens string = (Str.split (Str.regexp "[ \n\r\x0c\t]+") (string));;
For some reason, bad_function adds carats to places where they shouldn't be. Take the following line of code:
(bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
The first line of the string turns into:
^ This is some \n ^
When I feed the output from bad_function into string_to_tokens I get the following list:
string_to_tokens (bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
["^"; "This"; "is"; "some"; "^"; "multiline"; "input"; "^"; "with";
"newline"; "characters"; "^"; "and"; "tabs."; "When"; "I"; "convert";
"this"; "string"; "^"; "into"; "a"; "list"; "of"; "tokens"; "I"; "get";
"^s"; "showing"; "up"; "where"; "^"; "they"; "shouldn't."]
Why is this happening, and how can I fix so these functions behave like I want them to?
As explained in the Str module.
^ Matches at beginning of line: either at the beginning of the
matched string, or just after a '\n' character.
So you have to quote the '^' character using the escape character "\".
However, note that (also from the doc)
any backslash character in the regular expression must be doubled to
make it past the OCaml string parser.
This means you have to put a double '\' to do what you want without getting a warning.
This should do the job:
let bad_function string = Str.global_replace (Str.regexp "\\^") " ^ " (string);;

Matlab: using regexp to get a string that has a whitespace in between

I want to use Regex to acquire some ID's in a cellstring array, the array looks like this:
myString = '(['US04650Y1001', 'US90274P3029', 'HON WI', 'US41165F1012'])';
My pattern for regex is as follows:
pattern = '[A-Za-z0-9.^_]+';
newArr = regexp(myString, pattern,'match');
I'd like to get the ID called 'HON WI', but with my current pattern, its splitting it into two because my pattern can't deal with the whitespace properly. I would like to get the whole "HON WI", as well as my other strings, everything that's in '', these might have special characters like ^, . or _, but I don't know how to add the whitespace.
I already tried stuff like this, without success:
pattern = '[A-Za-z0-9.^_\s]+';
My new array should have, in each cell, the strings/ID's contained in myString (US04650Y1001, US90274P3029, HON WI and US41165F1012) with dimensions 1x4.
Another approach that seems to work but not entirely sure:
myString = strrep(myString,'([','');
myString = strrep(myString,'])','');
myString = regexp(myString,',','split');
myString = strrep(myString,'''','');
This seems to get me what I want, but I would like to know how can I alter the regex on my first approach.
Many thanks in advance.
You may use a mere '([^']+)' regex and use 'tokens' to get the captures:
myString = '([''US04650Y1001'', ''US90274P3029'', ''HON WI'', ''US41165F1012''])';
pattern = '''([^'']+)''';
newArr = regexp(myString, pattern,'match', 'tokens');
The newArr will look like
{
[1,1] = 'US04650Y1001'
[1,2] = 'US90274P3029'
[1,3] = 'HON WI'
[1,4] = 'US41165F1012'
}
You may option is to use lookaround assertions. The following will match any string made of alphanumeric character or underscore (\w), space (' ') or characters . or ^, that is located between quotes. This will specifically exclude the blank space next to the comma, in the separation between tokens, i.e. ', ' does not give a match.
Note that \s will match any blank space character (including tab, newline), this is why a space is preferred here:
pattern2='(?<='')[\w.^ ]+(?='')';
pattern2 =
(?<=')[\w.^ ]+(?=')
newArr = regexp(myString, pattern2,'match');
newArr'
ans =
'US04650Y1001'
'US90274P3029'
'HON WI'
'US41165F1012'

C++ Qt QString replace double backslash with one

I have a QString with following content:
"MXTP24\\x00\\x00\\xF4\\xF9\\x80\r\n"
I want it to become:
"MXTP24\x00\x00\xF4\xF9\x80\r\n"
I need to replace the "\x" to "\x" so that I can start parsing the values. But the following code, which I think should do the job is not doing anything as I get the same string before and after:
qDebug() << "BEFORE: " << data;
data = data.replace("\\\\x", "\\x", Qt::CaseSensitivity::CaseInsensitive);
qDebug() << "AFTER: " << data;
Here, no change!
Then I tried like this:
data = data.replace("\\x", "\x", Qt::CaseSensitivity::CaseInsensitive);
Then compiler complaines that \x used with no following hex digits!
any ideas?
First let's look at what this piece of code does:
data.replace("\\\\x", "\\x", ....
First string becomes \\x in compiled code, and is used as regular expression. In reqular expression, backslash is special, and needs to be escaped with another backslash to mean actual single backslash character, and your regexp does just this. 4 backslashes in C+n string literal regexp means matching single literal backslash in target text. So your reqular expression matches literal 2-character string \x.
Then you replace it. Replacement isn't a reqular expression, so backslash doesn't need double escaping here, so you end up using literal 2-char replacement string \x, which is same as what you matched, so even if there is a match, nothing changes.
However, this is not your problem, your problem is how qDebug() prints strings. It prints them escaped. That \" at start of output means just plain double quote, 1 char, in the actual string because double quote is escaped. And those \\ also are single backslash char, because literal backslash is also escaped (because it is the escape char and has special meaning for the next char).
So it seems you don't need to do any search replace at all, just remove it.
Try printing the QString in one of these ways to get is shown literally:
std::cout << data << std::endl;
qDebug() << data.toLatin1().constData();

Wrong return from Regex.IsMatch - Regular expression

I want to find in string a specific string surrounded by white spaces. For example I want receive the value true from:
Regex.IsMatch("I like ZaleK", "zalek",RegexOptions.IgnoreCase)
and value false from:
Regex.IsMatch("I likeZaleK", "zalek",RegexOptions.IgnoreCase)
Here is my code:
Regex.IsMatch(w_all_file, #"\b" + TB_string.Text.Trim() + #"\b", RegexOptions.IgnoreCase) ;
It does not work when in the w_all_file is string I am looking for followed by "-"
For example: if w_all_file = "I like zalek_" - the string "zalek" is not found, but if
w_all_file = "I like zalek-" - the string "zalek" is found
Any ideas why?
Thanks,
Zalek
The \b character in regex doesn't consider an underscore as word boundry. You might want to change it to something like this:
Regex.IsMatch(w_all_file, #"[\b_]" + TB_string.Text.Trim() + #"[\b_]", RegexOptions.IgnoreCase) ;
That's what you need?
string input = "type your name";
string pattern = "your";
Regex.IsMatch(input, " " + pattern + " ");
\b matches at a word boundary, which are defined as between a character that is included in \w and one that is not. \w is the same as [a-zA-Z0-9_], so it matches underscores.
So basically, \b will match after the "k" in zalek- but not in zalek_.
It sounds like you want the match to also fail on zalek-, which you can do by using lookaround. Just replace the \b at the beginning with (?<![\w-]), and replace the \b at the end with (?![\w-]):
Regex.IsMatch(w_all_file, #"(?<![\w-])" + TB_string.Text.Trim() + #"(?![\w-])", RegexOptions.IgnoreCase) ;
Note that if you add additional characters to the character class [\w-], you need to make sure that the "-" is the very last character, or that you escape it with a backslash (if you don't it will be interpreted as a range of characters).