Extract the string between two words using RegEx in QT [duplicate] - c++

can anybody help me with this?
I have a string which contains N substrings, delimited by tags and I have to get ALL of the substrings. The string is like
STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND
I would like to get all the strings between START/END tags, I tried with a couple of regular expressions with no luck:
(START)(.*)(END) gives me ALL the contend between the first and last tag
(START)(\w+)(END) gives me no result
The code is much simple:
QString l_str "STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND";
QRegExp rx("(START)(\w+)(END)");
QStringList list;
int pos = 0;
while ((pos = rx.indexIn(l_str, pos)) != -1)
{
list << rx.cap(1);
pos += rx.matchedLength();
}
qWarning() << list;
I'd like a resulting list like:
STARTfoo barEND
STARThi there!END
STARTstackoverflowrulezEND
Any help?
Thanks!

Use rx.setMinimal(true) with .* to make it lazy:
QRegExp rx("START.*END");
rx.setMinimal(true);
See the QRegExp::setMinimal docs:
Enables or disables minimal matching. If minimal is false, matching is greedy (maximal) which is the default.

Related

How to split QString based on a given character length?

I am trying to split QString based on 19 characters per group.
Here is the string:
+1.838212011719E+04-1.779050827026E+00 3.725290298462E-09 0.000000000000E+00
I wish to split it into:
+1.838212011719E+04
-1.779050827026E+00
3.725290298462E-09
0.000000000000E+00
I have tryed using QRegularExpression, but I could not come up with a solution.
How to do this?
Solution
I would suggest you to use a loop instead of a regular expression.
Example
Here is an example I have prepared for you of how to implement this in C++:
bool splitString(const QString &str, int n, QStringList &list)
{
if (n < 1)
return false;
QString tmp(str);
list.clear();
while (!tmp.isEmpty()) {
list.append(tmp.left(n));
tmp.remove(0, n);
}
return true;
}
Note: Optionally you can use QString::trimmed(), i.e. list.append(tmp.left(n).trimmed());, in order to get rid of the leading whitespace.
Result
Testing the example with your input:
QStringList list;
if (splitString("+1.838212011719E+04-1.779050827026E+00 3.725290298462E-09 0.000000000000E+00", 19, list))
qDebug() << list;
produces the following results:
without QString::trimmed()
("+1.838212011719E+04", "-1.779050827026E+00", " 3.725290298462E-09", " 0.000000000000E+00")
with QString::trimmed()
("+1.838212011719E+04", "-1.779050827026E+00", "3.725290298462E-09", "0.000000000000E+00")
Use this regular expression:
^(.{19})(.{19})(.{19})(.{19})
I would also recommend using a tool like RegEx101. Give it a try ans see what happens.

Find strings between two tags with regex in Qt

can anybody help me with this?
I have a string which contains N substrings, delimited by tags and I have to get ALL of the substrings. The string is like
STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND
I would like to get all the strings between START/END tags, I tried with a couple of regular expressions with no luck:
(START)(.*)(END) gives me ALL the contend between the first and last tag
(START)(\w+)(END) gives me no result
The code is much simple:
QString l_str "STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND";
QRegExp rx("(START)(\w+)(END)");
QStringList list;
int pos = 0;
while ((pos = rx.indexIn(l_str, pos)) != -1)
{
list << rx.cap(1);
pos += rx.matchedLength();
}
qWarning() << list;
I'd like a resulting list like:
STARTfoo barEND
STARThi there!END
STARTstackoverflowrulezEND
Any help?
Thanks!
Use rx.setMinimal(true) with .* to make it lazy:
QRegExp rx("START.*END");
rx.setMinimal(true);
See the QRegExp::setMinimal docs:
Enables or disables minimal matching. If minimal is false, matching is greedy (maximal) which is the default.

C++ , Regular expression

I know how to find regular expression in specific string. How to find first element that match with regular expression?
Here is my code:
QString mangledText;
QRegExp rx("string");
while ((pos = rx.indexIn(mangledText)) != -1){
mangledText.replace(pos, rx.matchedLength(), "replaced string");
}
I want to replace first match result (or second or third) instead of all of that.
Any suggestion?
I want to replace first match result instead of all of that.
Use an if instead of a while.
if ((pos = rx.indexIn(mangledText)) != -1){
mangledText.replace(pos, rx.matchedLength(), "replaced string");
}

Qt Using QRegularExpression multiline option

I'm writing a program that use QRegularExpression and MultilineOption, I wrote this code but matching stop on first line. Why? Where am I doing wrong?
QString recv = "AUTH-<username>-<password>\nINFO-ID:45\nREG-<username>-<password>-<name>-<status>\nSEND-ID:195-DATE:12:30 2/02/2015 <esempio>\nUPDATEN-<newname>\nUPDATES-<newstatus>\n";
QRegularExpression exp = QRegularExpression("(SEND)-ID:(\\d{1,4})-DATE:(\\d{1,2}):(\\d) (\\d{1,2})\/(\\d)\/(\\d{2,4}) <(.+)>\\n|(AUTH)-<(.+)>-<(.+)>\\n|(INFO)-ID:(\\d{1,4})\\n|(REG)-<(.+)>-<(.+)>-<(.+)>-<(.+)>\\n|(UPDATEN)-<(.+)>\\n|(UPDATES)-<(.+)>\\n", QRegularExpression::MultilineOption);
qDebug() << exp.pattern();
QRegularExpressionMatch match = exp.match(recv);
qDebug() << match.lastCapturedIndex();
for (int i = 0; i <= match.lastCapturedIndex(); ++i) {
qDebug() << match.captured(i);
}
Can someone help me?
The answer is you should use .globalMatch method rather than .match.
See QRegularExpression documentation on that:
Attempts to perform a global match of the regular expression against
the given subject string, starting at the position offset inside the
subject, using a match of type matchType and honoring the given
matchOptions. The returned QRegularExpressionMatchIterator is
positioned before the first match result (if any).
Also, you can remove the QRegularExpression::MultilineOption option as it is not being used.
Sample code:
QRegularExpressionMatchIterator i = exp.globalMatch(recv);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
// ...
}
Actually I google'd this question having similar issue, but I couldn't agree completely with an answer, as I think most of the questions about multi-line matching with new QRegularExpression can be answered as following:
use QRegularExpression::DotMatchesEverythingOption option which allows (.) to match newline characters. Which is extremely useful then porting from QRegExp
you got an or Expression and the first one is true, job is done.
you need to split the string and loop the array to compare with this Expression will work i think.
If the data every times have the same struct you can use something like this:
"(AUTH)-<([^>]+?)>-<([^>]+?)>\\nINFO-ID:(\\d+)\\n(REG)-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>\\n(SEND)-ID:(\\d+)-DATE:(\\d+):(\\d+) (\\d+)/(\\d+)/(\\d+) <([^>]+?)>\\n(UPDATEN)-<([^>]+?)>\\n(UPDATES)-<([^>]+?)>"
21 Matches

Qt and QtRegExp for parsing html tags

I want to parse out html tag name. My code is this:
QRegExp exp("<\\s*(\\w+)\\s*");
exp.indexIn("<html> hi there </html>");
qDebug() << exp.cap(1);
It's logging "h" instead of "html". Why? As far as I understand it, the \w+ should find a string with one or more word characters, in this case "html". But since it's not, what would be the right way to achive this?
That's because you are using indexIn without iterating the matches and you aren't using the capturing groups to access to your captured data.
Use this code instead:
QRegExp rx("<\\s*(\\w+)\\s*");
QString str = "<html> hi there </html>";
QStringList list;
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1) {
list << rx.cap(1);
pos += rx.matchedLength();
}
// list: ["html"]