Diacritical marks in regular expression causes unexpected behavior - c++

I check name validity by this regular expression, allowing any symbol as suggested here:
// Allow any symbol
const QString validNameMatcher = QStringLiteral("^[a-zA-Z0-9 _.,!()+=`,\"#$#%*-]+$");
bool Class::isNameValid(const QString fileName)
{
QRegularExpression re(validNameMatcher);
QRegularExpressionMatch match = re.match(fileName);
if (match.hasMatch())
return true;
else
return false;
}
For a file name like 1111 Rick (wow) L50-57.stl the above function returns true. So far so good.
To allow diacritical marks, I just add [À-ž] to the name-matcher as suggested here:
// [À-ž] is for diacritical marks
const QString validNameMatcher = QStringLiteral("^[a-zA-Z0-9À-ž _.,!()+=`,\"#$#%*-]+$");
After adding [À-ž], surprisingly, for the same file name of 1111 Rick (wow) L50-57.stl, the above function returns false. Am I missing something?
UPDATE
As suggested by #WiktorStribiżew , I used UseUnicodePropertiesOption:
QRegularExpression re(validNameMatcher, QRegularExpression::PatternOption::UseUnicodePropertiesOption);
But it didn't work. The result is the same as before.
Also (*UTF) doesn't work:
const QString validNameMatcher = QStringLiteral("(*UTF)^[a-zA-Z0-9À-ž _.,!()+=`,\"#$#%*-]+$");

The key point is #WiktorStribiżew solution of using QRegularExpression::UseUnicodePropertiesOption option:
QRegularExpression re(validNameMatcher, QRegularExpression::PatternOption::UseUnicodePropertiesOption);
But as mentioned on its documentation:
QRegularExpression::UseUnicodePropertiesOption
The meaning of the \w, \d, etc., character classes, as well as the meaning of their counterparts (\W, \D, etc.), is changed from matching ASCII characters only to matching any character with the corresponding Unicode property.
So, it occurred to me to replace [a-zA-Z0-9À-ž_] in my regular expression with just [\w]:
// Bad:
const QString validNameMatcher = QStringLiteral("^[a-zA-Z0-9À-ž _.,!()+=`,\"#$#%*-]+$");
// Good:
const QString validNameMatcher = QStringLiteral("^[\\w .,!()+=`,\"#$#%*-]+$");
Now, isNameValid() function returns expected results.

Related

(Qt) Validate string against multiple regular expressions simultaneously

I'm checking a string which contains vehicle registration information against regular expressions for validity. I have several regular expression for each criteria I need. How can I validate the string against all my reg expressions without having to combine them into one expression or do something like this to determine if it's valid?
if( s_expGP.exactMatch(lineEdit->text()) ||
s_expGPNew.exactMatch(lineEdit->text()) ||
s_expPersonal.exactMatch(lineEdit->text()) ||
s_expGov.exactMatch(lineEdit->text()) )
{
//do stuff
}
The only option would be to create a single regular expression by combining s_expGP, s_expGPNew, s_expPersonal and the rest if that is possible, otherwise I don't think there could be any other way.
If you have a big number of regexp to test or if you may need to verify the string more than once. You can create a function like this
bool isValid(const QVector<QRegExp>& regExps, const QString& input)
{
for(QRegExp exp : regExps)
{
if(!exp.exactMatch(input))
return false;
}
return true;
}
Or use a static QVector like you have static regexp.

check/determine if QString contains html

Since I wasn't able to find a suitable solution here, I wanted to Q&A this question:
Is there a way to determine if a QString is made of html, i.e. is rich-text, (or at least, contains html)?
This may be the case for unknown/QVariant calls to setData of data editors in the table/view model.
A solution can be to use Qt::mightBeRichText for QString:
#include <QTextDocument>
QString ensurePlainText(const QString& text)
{
QString out;
if (Qt::mightBeRichText(text))
{
// is html -> convert to plain text
QTextDocument text;
text.setHtml(value.toString());
out = text.toPlainText();
}
else
{
out = text;
}
return out;
}
It is important to note that the presented method uses a heuristic. It may fail to detect html or falsely detect html in a non-html text. The former may return html tags in the string. The latter would, for instance, strip newline characters from the text.

How to check a specified string is a valid URL or not using C++ code

there any possible way to check that the specified string is a valid url or not. The solution must be in c++ and it should work without internet.
example strings are
good.morning
foo.goo.koo
https://hhhh
hdajdklbcbdhd
8881424.www.hfbn55.co.in/sdfsnhjk
://dgdh24.vom
dfgdfgdf(2001)/.com/sdgsgh
\adiihsdfghnhg.co.inskdhhj
aser//www.gtyuh.co.uk/kdsfgdfgfrgj
Chose a symphatetic regular expression like /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/.
Use std regex, or boost regex if you don't have C++11:
if (std::regex_match ("http://subject", std::regex("^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$") )) {
// ...
}
You could use regex.
What a regex is.
With C++11 the regex are build-in the STD library
regex c++11.
If you cannot use C++11, for some reason, you could use boost library.
Anyway you could check the patter of an url with:
#include <regex> //require c++11
// ...
// regex pattern
std::string pattern = "https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)";
// Construct regex object
std::regex url_regex(pattern);
// An url-string for example
std::string my_url = "http://www.google.com/img.png";
// Check for match
if (std::regex_match(my_url, url_regex) == true) {
std::cout << "This is a well-formed url\n";
} else {
std::cout << "Ill-formed url\n";
}

QString variable changed to QCharRef when i use pointers in method

Hello everyone I am trying to get to know pointers better and I stumbled into a Qt type change. I have made a QString array and gave the pointer to the array to a method. But when I try to use a QString functions it give a error and says that it is a QCharRef which does not have the member function isEmpty().
The code:
QString data_array[2][3] =
{
{"11:28:8","Room 1","Presence detected"},
{"11:38:8","Room 1","No presence"}
}
bool method(QString *_data_array)
{
QString *data_array = _data_array;
return data_array[0][1].isEmpty(); /* changed to QCharRef */
}
My question is why does this happen and how can I prevent it or change it?
The reason for which you are getting QCharRef is due to how QString is built. The [] operator returns one character from a QString (QString is built up from QChars, much like strings in C/C++ are character arrays). From the Qt documentation:
The return value is of type QCharRef, a helper class for QString. When you get an object of type QCharRef, you can use it as if it were a QChar &. If you assign to it, the assignment will apply to the character in the QString from which you got the reference.
So what that means for you is that when you use the lovely square bracket operators, you are no longer using a QString, you are using a QChar reference.
As for how to change it, QChar's isNull() seems like it would fit your uses. so instead try return data_array[0][1].isNull(); and that should work.
I would also look into QStringList if you're doing things with lists of strings

regex with Qt - indexIn(const QString &) does not work as expected

I am using QRegExp and tries to find whether a QString is containing some pattern. There is no compiling error, but no match is identified at runtime where identification should normally happen. I tested the regexp in Python shell and match occurs with Python. i checked upon Qt doc that syntax is the same for the ergexp I am using. Here is code sample
bool Thing::isConstraint(const QString &cstr_)
{
QRegExp lB1("^(\d+\.?\d*|\d*\.\d+)<=PARAM(\d+)$");
QRegExp lB2("^PARAM(\d+)>=(\d+\.?\d*|\d*\.\d+)$");
QRegExp lB3("^PARAM(\d+)>(\d+\.?\d*|\d*\.\d+)$");
QRegExp lB4("^(\d+\.?\d*|\d*\.\d+)<PARAM(\d+)$");
QRegExp uB5("^(\d+\.?\d*|\d*\.\d+)>=PARAM(\d+)$");
QRegExp uB6("^(\d+\.?\d*|\d*\.\d+)>PARAM(\d+)$");
QRegExp uB7("^PARAM(\d+)<=(\d+\.?\d*|\d*\.\d+)$");
QRegExp uB8("^PARAM(\d+)<(\d+\.?\d*|\d*\.\d+)$");
QRegExp luB9("^(\d+\.?\d*|\d*\.\d+)>=PARAM(\d+)>=(\d+\.?\d*|\d*\.\d+)$");
QRegExp luB10("^(\d+\.?\d*|\d*\.\d+)>PARAM(\d+)>=(\d+\.?\d*|\d*\.\d+)$");
QRegExp luB11("^(\d+\.?\d*|\d*\.\d+)>=PARAM(\d+)>(\d+\.?\d*|\d*\.\d+)$");
QRegExp luB12("^(\d+\.?\d*|\d*\.\d+)>PARAM(\d+)>(\\d+\.?\d*|\d*\.\d+)$");
QRegExp luB13("^(\d+\.?\d*|\d*\.\d+)<=PARAM(\d+)<=(\d+\.?\d*|\d*\.\d+)$");
QRegExp luB14("^(\d+\.?\d*|\d*\.\d+)<=PARAM(\d+)<(\d+\.?\d*|\d*\.\d+)$");
QRegExp luB15("^(\d+\.?\d*|\d*\.\d+)<PARAM(\d+)<=(\d+\.?\d*|\d*\.\d+)$");
QRegExp luB16("^(\d+\.?\d*|\d*\.\d+)<PARAM(\d+)<(\d+\.?\d*|\d*\.\d+)$");
int pos_=0;
if((pos_ = lB1.indexIn(cstr_)) != -1)
{
m_func->setLowerBound((lB1.cap(2)).toInt(),(lB1.cap(1)).toDouble());
return true;
}
else if((pos_ = lB2.indexIn(cstr_)) != -1)
{
m_func->setLowerBound((lB2.cap(1)).toInt(),(lB2.cap(2)).toDouble());
return true;
}
/*
...
*/
return false;
}
This method is called in this other method:
void Thing::setConstraints(QStringList &constraints_)
{
if(!m_func)
return;
for(int j=0;j<constraints_.size();j++)
{
if(isConstraint(constraints_.at(j)))
{
constraints_.removeAt(j);
}
}
m_func->setConstraints(constraints_);
}
In VS2010 Watch, error for lB1.indexIn(cstr_) is: Error: argument list does not match a function .
Second, I would like the isConstraint() method to begin with this check and replace for whitespaces:
QRegExp wsp ("\s+");
cstr_.replace(wsp,"");
how to proceed avoiding const_cast ??
Thanks and regards.
edit ---------
needed to double backslash in C++ - different from Python. Tks!
I think you asked two questions, so I'll try to answer them:
1) Your regular expressions are most likely not passing because you need to escape your backslashes so that C++ doesn't mess up your strings. For example:
QRegExp lB1("^(\\d+\\.?\\d*|\\d*\\.\\d+)<=PARAM(\\d+)$");
2) To avoid using const_cast you can either change your function signature to this:
bool Thing::isConstraint( QString cstr_)
or make a copy of the cstr_ object and operate on the copy instead.
As a side note, you may want to take a look at the QRegExp::exactMatch() function which obviates the need to use ^ and $ at the beginning and end of all of your expressions, and also has a bool return value which would make your if statements a little cleaner.