Qt Regular Expression Escape Sequence Problem - regex

I'm struggling to get a regular expression implemented. I'm using Qt creator on an Ubuntu system. I tested my regex against an example number with a 3rd party tool. So I believe the problem is not with the expression.
My desired reg ex:
/\b(9410 ?\d{18})\b/i
I am putting the regex string into a QString variable. Which results in an error:
QString test = "/\b(9410 ?\d{18})\b/i"; unknown escape sequence '\d'
In an attempt to fix, I add an extra \ at the point of the error:
QString test = "/\b(9410 ?\\d{18})\b/i";
qWarning() << test;
Debugger indicates (note the \\):
/\b(9410 ?\\d{18})\b/i
I also tried a raw string:
QString test = R"(/\b(9410 ?\d{18})\b/i)";
qWarning() << test;
Debugger shows all single \ replaced with \\.
/\\b(9410 ?\\d{18})\\b/i
None of these attempts has resulted in a working reg ex. There is something fishy going on with the back slashes. Appreciate your thoughts. I must be missing something simple...
EDIT: Here is some simplified code. When I run this it returns "FALSE" indicating no match. I tested this regex and number at regex101.com. Works there. That's why I believe something is flawed in my implementation. Just can't put my finger on it.
QRegularExpression re;
QString test = R"(/\b(9410 ?\d{18})\b/i)";
re.setPattern(test);
if(re.match("9410811298370146293071").hasMatch())
{
qWarning() << "TRUE";
}
else {
qWarning() << "FALSE";
}

Cleaned up the regex and it now matches.
QRegularExpression re;
QString test = R"(9410 ?\d{18})";
re.setPattern(test);
if(re.match("9410811298370146293071").hasMatch())
{
qWarning() << "TRUE";
}
else {
qWarning() << "FALSE";
}

Related

How to organize or extract info from a QByteArray

I have a programm that recieves a full block in a single QByteArray. This block is "divided" with 'carriage returns' followed by 'end lines' (\r\n). In the middle of all this junk I have a date. Most specifically in the third line (between the second and the third \r\n).
Every time I try to extract this date from the ByteArray I end up with some random junk. How to be more precise with the QByteArray?
What is the best way of extracting this date without altering my ByteArray? Take in consideration that I don't know the date and it can even be in the wrong format.
Just for understanding purposes, here is an example of my ByteArray:
RandomName=name\r\nRandomID=ID\r\nRandomDate=date\r\nRandomTime=time\r\nRandomWhatever=whatever(...)
EDIT:
Sorry for bad english.
Let's say I have the following text sent to me:
ProgName = Marcus
ProgID = 180
ProgDate = 15.01.16
ProgTime = 13:39
(More info)......
However, none of this information is useful to me... except the Date. Everything was stored in a single QByteArray (Let's call it 'ba'). So this is my ba:
ProgName(space)=(space)Marcus\r\nProgID(space)=(space)180\r\nProgDate(space)=(space)15.01.16\r\nProgTime(space)=(space)13:39\r\n (keeps going)
My problem is: Storing "15.01.16" (the "ProgDate") in a QString without altering or destroying ba.
There are a variety of ways, but try one of the following solutions.
1) using split()
foreach (auto subByte, yourByteArray.replace("\r\n", "\n").split('\n')) {
qDebug() << subByte;
foreach (auto val, subByte.split('=')) {
qDebug() << val;
}
}
2) using QRegularExpression/QRegularExpressionMatchIterator, making all pair(key, value)
QRegularExpression re("(\\w+)=(\\w+)");
QRegularExpressionMatchIterator i = re.globalMatch(yourByteArray);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
qDebug() << match.captured(0)<< match.captured(1) << match.captured(2);
}
3) using QRegularExpression/QRegularExpressionMatch
QRegularExpression re("(RandomDate)=(\\w+)");
QRegularExpressionMatch match = re.match(yourByteArray);
if (match.hasMatch())
qDebug() << match.captured(0)<< match.captured(1) << match.captured(2);

Qt Using QRegularExpression multiline option

I'm writing a program that use QRegularExpression and MultilineOption, I wrote this code but matching stop on first line. Why? Where am I doing wrong?
QString recv = "AUTH-<username>-<password>\nINFO-ID:45\nREG-<username>-<password>-<name>-<status>\nSEND-ID:195-DATE:12:30 2/02/2015 <esempio>\nUPDATEN-<newname>\nUPDATES-<newstatus>\n";
QRegularExpression exp = QRegularExpression("(SEND)-ID:(\\d{1,4})-DATE:(\\d{1,2}):(\\d) (\\d{1,2})\/(\\d)\/(\\d{2,4}) <(.+)>\\n|(AUTH)-<(.+)>-<(.+)>\\n|(INFO)-ID:(\\d{1,4})\\n|(REG)-<(.+)>-<(.+)>-<(.+)>-<(.+)>\\n|(UPDATEN)-<(.+)>\\n|(UPDATES)-<(.+)>\\n", QRegularExpression::MultilineOption);
qDebug() << exp.pattern();
QRegularExpressionMatch match = exp.match(recv);
qDebug() << match.lastCapturedIndex();
for (int i = 0; i <= match.lastCapturedIndex(); ++i) {
qDebug() << match.captured(i);
}
Can someone help me?
The answer is you should use .globalMatch method rather than .match.
See QRegularExpression documentation on that:
Attempts to perform a global match of the regular expression against
the given subject string, starting at the position offset inside the
subject, using a match of type matchType and honoring the given
matchOptions. The returned QRegularExpressionMatchIterator is
positioned before the first match result (if any).
Also, you can remove the QRegularExpression::MultilineOption option as it is not being used.
Sample code:
QRegularExpressionMatchIterator i = exp.globalMatch(recv);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
// ...
}
Actually I google'd this question having similar issue, but I couldn't agree completely with an answer, as I think most of the questions about multi-line matching with new QRegularExpression can be answered as following:
use QRegularExpression::DotMatchesEverythingOption option which allows (.) to match newline characters. Which is extremely useful then porting from QRegExp
you got an or Expression and the first one is true, job is done.
you need to split the string and loop the array to compare with this Expression will work i think.
If the data every times have the same struct you can use something like this:
"(AUTH)-<([^>]+?)>-<([^>]+?)>\\nINFO-ID:(\\d+)\\n(REG)-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>\\n(SEND)-ID:(\\d+)-DATE:(\\d+):(\\d+) (\\d+)/(\\d+)/(\\d+) <([^>]+?)>\\n(UPDATEN)-<([^>]+?)>\\n(UPDATES)-<([^>]+?)>"
21 Matches

How to match absolute value using regex

I am having trouble with absolute value in regex in C++. This is what I have as the pattern:
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|"); // load -|M(x)|
I am trying to use std::tr1::regex_match( IR, result, loadNM ) to match. But it is not matching anything, even though it should be.
I'm using Visual Stuido 2010 compilier
shortened version of program (included above is iostream and regex)
int main()
{
std::string IR = "load -|M(x)|";
std::smatch result;
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|");
if( std::tr1::regex_match( IR , result, loadAbsNM ) )
{
int x = 2;
std::cout << "matched!" << std::endl;
}
else
{
std::cout << "!UNABLE TO DECODE INSTRUCTION!" << std::endl;
}
}
output produced
!UNABLE TO DECODE INSTRUCTION!
Note that from your code, you're not going to have a match. The letter x won't match the regex \d+.
Also, I'm not too sure whether you need a backslash in front of the pipe character. As you may know, pipe (|) is used to separate possible entries: (a|b) means a or b.
Finally, since their is a pipe at the end, the expression matches the empty string which is often a bad idea.
I would suggest something like this:
"load -\\|M\\((\\d+)\\)\\|"
But that won't match:
"load -|M(x)|"
You'd need to use a number instead of 'x' as in:
"load -|M(123)|"

Comparing regex in qt

I have a regex which I hope means any file with extension listed:
((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))
How to compare it in Qt against selected file?
Your actual RegEx itself doesn't have double backslashes (just when you fit it into a string literal). And you'll need some kind of wildcard if you want to use it to match full filenames. There's a semantic issue of whether you want a file called just ".cpp" to match or not. What about case sensitivity?
I'll assume for the moment that you want at least one other character in the beginning and use .+:
.+((\.cpp$)|(\.cxx$)|(\.c$)|(\.hpp$)|(\.h$))
So this should work:
QRegExp rx (".+((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))");
bool isMatch = rx.exactMatch(filename);
But with the expressive power of a whole C++ compiler at your beck and call, it can be a bit stifling to use regular expressions. You might have an easier time adapting code if you write it more like:
bool isMatch = false;
QStringList fileExtensionList;
fileExtensionList << "CPP" << "CXX" << "C" << "HPP" << "H";
QStringList splitFilenameList = filename.split(".");
if(splitFilenameList.size() > 1) {
QString fileExtension = splitFilenameList[splitFilenameList.size() - 1];
isMatch = fileExtensionList.contains(fileExtension.toUpper()));
}

PCRECPP (pcre) extract hostname from url code problem

I have this simple piece of code in c++:
int main(void)
{
string text = "http://www.amazon.com";
string a,b,c,d,e,f;
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?#)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
if(re.PartialMatch(text, &a,&b,&c,&d,&e,&f))
{
std::cout << "match: " << f << "\n";
// should print "www.amazon.com"
}else{
std::cout << "no match. \n";
}
return 0;
}
When I run this it doesn't find a match.
I pretty sure that the regex pattern is correct and my code is what's wrong.
If anyone familiar with pcrecpp can take a look at this Ill be grateful.
EDIT:
Thanks to Dingo, it works great.
another issue I had is that the result was at the sixth place - "f".
I edited the code above so you can copy/paste if you wish.
The problem is that your code contains ??( which is a trigraph in C++ for [. You'll either need to disable trigraphs or do something to break them up like:
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?#)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??" "([^#]+)?#?(\\w*)");
Please do
cout << re.pattern() << endl;
to double-check that all your double-slashing is done right (and also post the result).
Looks like
^((\w+):///?)?((\w+):?(\w+)?#)?([^/\?:]+):?(\d+)?(/?[^\?#;\|]+)?([;\|])?([^\?#]+)?\??([^#]+)?#?(\w*)
The hostname isn't going to be returned from the first capture group, why are you using parentheses around for example \w+ that you aren't wanting to capture?