Qt Using QRegularExpression multiline option - c++

I'm writing a program that use QRegularExpression and MultilineOption, I wrote this code but matching stop on first line. Why? Where am I doing wrong?
QString recv = "AUTH-<username>-<password>\nINFO-ID:45\nREG-<username>-<password>-<name>-<status>\nSEND-ID:195-DATE:12:30 2/02/2015 <esempio>\nUPDATEN-<newname>\nUPDATES-<newstatus>\n";
QRegularExpression exp = QRegularExpression("(SEND)-ID:(\\d{1,4})-DATE:(\\d{1,2}):(\\d) (\\d{1,2})\/(\\d)\/(\\d{2,4}) <(.+)>\\n|(AUTH)-<(.+)>-<(.+)>\\n|(INFO)-ID:(\\d{1,4})\\n|(REG)-<(.+)>-<(.+)>-<(.+)>-<(.+)>\\n|(UPDATEN)-<(.+)>\\n|(UPDATES)-<(.+)>\\n", QRegularExpression::MultilineOption);
qDebug() << exp.pattern();
QRegularExpressionMatch match = exp.match(recv);
qDebug() << match.lastCapturedIndex();
for (int i = 0; i <= match.lastCapturedIndex(); ++i) {
qDebug() << match.captured(i);
}
Can someone help me?

The answer is you should use .globalMatch method rather than .match.
See QRegularExpression documentation on that:
Attempts to perform a global match of the regular expression against
the given subject string, starting at the position offset inside the
subject, using a match of type matchType and honoring the given
matchOptions. The returned QRegularExpressionMatchIterator is
positioned before the first match result (if any).
Also, you can remove the QRegularExpression::MultilineOption option as it is not being used.
Sample code:
QRegularExpressionMatchIterator i = exp.globalMatch(recv);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
// ...
}

Actually I google'd this question having similar issue, but I couldn't agree completely with an answer, as I think most of the questions about multi-line matching with new QRegularExpression can be answered as following:
use QRegularExpression::DotMatchesEverythingOption option which allows (.) to match newline characters. Which is extremely useful then porting from QRegExp

you got an or Expression and the first one is true, job is done.
you need to split the string and loop the array to compare with this Expression will work i think.
If the data every times have the same struct you can use something like this:
"(AUTH)-<([^>]+?)>-<([^>]+?)>\\nINFO-ID:(\\d+)\\n(REG)-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>\\n(SEND)-ID:(\\d+)-DATE:(\\d+):(\\d+) (\\d+)/(\\d+)/(\\d+) <([^>]+?)>\\n(UPDATEN)-<([^>]+?)>\\n(UPDATES)-<([^>]+?)>"
21 Matches

Related

Extract string matching a specific format

Given a QString, I want to extract a substring from the main string input.
e.g. I have a QString reading something like:
\\\\?\\Volume{db41aa6a-c0b8-11e9-bc8a-806e6f6e6963}\\
I need to extract the string (if a string with the format exists) using a template/format matching a regex format (\w){8}([-](\w){4}){3}[-](\w){12} as shown below:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
and it should return
db41aa6a-c0b8-11e9-bc8a-806e6f6e6963
if found, else an empty QString.
Currently, I can achieve this by doing something like:
string.replace("{", "").replace("}", "").replace("\\", "").replace("?", "").replace("Volume", "");
But this is tedious and inefficient, and tailored to a specific request.
Is there a generalized function that enables me to extract a substring using a regex format or other?
Update
To clarity after #Emma's answer, I want e.g. QString::extract("(\w){8}([-](\w){4}){3}[-](\w){12}") which returns db41aa6a-c0b8-11e9-bc8a-806e6f6e6963.
Here's a bunch of ways to extract part of a string as presented in the question. I don't know how much of the string format is fixed vs. variable, so possibly not all of these examples would be practical. Also some examples below are using QStringRef class which can be more efficient but must have the original string (the one being referenced) available while any references are active (see warning in docs).
const QString str("\\\\?\\Volume{db41aa6a-c0b8-11e9-bc8a-806e6f6e6963}\\");
// Treat str as a list delimited by "{" and "}" chars.
const QString sectResult = str.section('{', 1, 1).section('}', 0, 0); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
const QString sectRxResult = str.section(QRegExp("\\{|\\}"), 1, 1); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
// Example using QStringRef, though this could also be just QString::split() which returns QString copies.
const QVector<QStringRef> splitRef = str.splitRef(QRegExp("\\{|\\}"));
const QStringRef splitRefResult = splitRef.value(1); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
// Use regular expressions to find/extract matching string
const QRegularExpression rx("\\w{8}(?:-(\\w){4}){3}-\\w{12}"); // match a UUID string
const QRegularExpressionMatch match = rx.match(str);
const QString rxResultStr = match.captured(0); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
const QStringRef rxResultRef = match.capturedRef(0); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
const QRegularExpression rx2(".+\\{([^{\\}]+)\\}.+"); // capture anything inside { } brackets
const QRegularExpressionMatch match2 = rx2.match(str);
const QString rx2ResultStr = match2.captured(1); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
// Make a copy for replace so that our references to the original string remain valid.
const QString replaceResult = QString(str).replace(rx2, "\\1"); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
qDebug() << sectResult << sectRxResult << splitRefResult << rxResultStr
<< rxResultRef << rx2ResultStr << replaceResult;
Maybe,
Volume{(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)}
or just,
\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b
for a full match might be a bit closer.
If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.
RegEx Circuit
jex.im visualizes regular expressions:
Source
Searching for UUIDs in text with regex

Extract the string between two words using RegEx in QT [duplicate]

can anybody help me with this?
I have a string which contains N substrings, delimited by tags and I have to get ALL of the substrings. The string is like
STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND
I would like to get all the strings between START/END tags, I tried with a couple of regular expressions with no luck:
(START)(.*)(END) gives me ALL the contend between the first and last tag
(START)(\w+)(END) gives me no result
The code is much simple:
QString l_str "STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND";
QRegExp rx("(START)(\w+)(END)");
QStringList list;
int pos = 0;
while ((pos = rx.indexIn(l_str, pos)) != -1)
{
list << rx.cap(1);
pos += rx.matchedLength();
}
qWarning() << list;
I'd like a resulting list like:
STARTfoo barEND
STARThi there!END
STARTstackoverflowrulezEND
Any help?
Thanks!
Use rx.setMinimal(true) with .* to make it lazy:
QRegExp rx("START.*END");
rx.setMinimal(true);
See the QRegExp::setMinimal docs:
Enables or disables minimal matching. If minimal is false, matching is greedy (maximal) which is the default.

Find strings between two tags with regex in Qt

can anybody help me with this?
I have a string which contains N substrings, delimited by tags and I have to get ALL of the substrings. The string is like
STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND
I would like to get all the strings between START/END tags, I tried with a couple of regular expressions with no luck:
(START)(.*)(END) gives me ALL the contend between the first and last tag
(START)(\w+)(END) gives me no result
The code is much simple:
QString l_str "STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND";
QRegExp rx("(START)(\w+)(END)");
QStringList list;
int pos = 0;
while ((pos = rx.indexIn(l_str, pos)) != -1)
{
list << rx.cap(1);
pos += rx.matchedLength();
}
qWarning() << list;
I'd like a resulting list like:
STARTfoo barEND
STARThi there!END
STARTstackoverflowrulezEND
Any help?
Thanks!
Use rx.setMinimal(true) with .* to make it lazy:
QRegExp rx("START.*END");
rx.setMinimal(true);
See the QRegExp::setMinimal docs:
Enables or disables minimal matching. If minimal is false, matching is greedy (maximal) which is the default.

In Qt, what takes the least amount of code to replace string matches with regular expression captures?

I was hoping that QString would allow this:
QString myString("School is LameCoolLame and LameRadLame");
myString.replace(QRegularExpression("Lame(.+?)Lame"),"\1");
Leaving
"School is Cool and Rad"
Instead from what I saw in the docs, doing this is a lot more convoluted requiring you to do (from the docs):
QRegularExpression re("\\d\\d \\w+");
QRegularExpressionMatch match = re.match("abc123 def");
if (match.hasMatch()) {
QString matched = match.captured(0); // matched == "23 def"
// ...
}
Or in my case something like this:
QString myString("School is LameCoolLame and LameRadLame");
QRegularExpression re("Lame(.+?)Lame");
QRegularExpressionMatch match = re.match(myString);
if (match.hasMatch()) {
for (int i = 0; i < myString.count(re); i++) {
QString newString(match.captured(i));
myString.replace(myString.indexOf(re),re.pattern().size, match.captured(i));
}
}
And that doesn't even seem to work, (I gave up actually). There must be an easier more convenient way. For the sake of simplicity and code readability, I'd like to know the methods which take the least lines of code to accomplish this.
Thanks.
QString myString("School is LameCoolLame and LameRadLame");
myString.replace(QRegularExpression("Lame(.+?)Lame"),"\\1");
Above code works as you expected. In your version, you forgot to escape the escape character itself.

Comparing regex in qt

I have a regex which I hope means any file with extension listed:
((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))
How to compare it in Qt against selected file?
Your actual RegEx itself doesn't have double backslashes (just when you fit it into a string literal). And you'll need some kind of wildcard if you want to use it to match full filenames. There's a semantic issue of whether you want a file called just ".cpp" to match or not. What about case sensitivity?
I'll assume for the moment that you want at least one other character in the beginning and use .+:
.+((\.cpp$)|(\.cxx$)|(\.c$)|(\.hpp$)|(\.h$))
So this should work:
QRegExp rx (".+((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))");
bool isMatch = rx.exactMatch(filename);
But with the expressive power of a whole C++ compiler at your beck and call, it can be a bit stifling to use regular expressions. You might have an easier time adapting code if you write it more like:
bool isMatch = false;
QStringList fileExtensionList;
fileExtensionList << "CPP" << "CXX" << "C" << "HPP" << "H";
QStringList splitFilenameList = filename.split(".");
if(splitFilenameList.size() > 1) {
QString fileExtension = splitFilenameList[splitFilenameList.size() - 1];
isMatch = fileExtensionList.contains(fileExtension.toUpper()));
}