Match nested capture groups with quantifiers using QRegularExpression - c++

I'm trying to get with a QRegularExpression all attributes of an xml tag in the different captured groups. I use a regex matching the tags and I manage to get the capture groups containing the attribute value but with a quantifier, I get only the last one.
I use this regex :
<[a-z]+(?: [a-z]+=("[^"]*"))*>
And I would like to get "a" and "b" with this text :
<p a="a" b="b">
Here is the code:
const QString text { "<p a=\"a\" b=\"b\">" };
const QRegularExpression pattern { "<[a-z]+(?: [a-z]+=(\"[^\"]*\"))*>" };
QRegularExpressionMatchIterator it = pattern.globalMatch(text);
while (it.hasNext())
{
const QRegularExpressionMatch match = it.next();
qDebug() << "Match with" << match.lastCapturedIndex() + 1 << "captured groups";
for (int i { 0 }; i <= match.lastCapturedIndex(); ++i)
qDebug() << match.captured(i);
}
And the output :
Match with 2 captured groups
"<p a=\"a\" b=\"b\">"
"\"b\""
Is it possible to get multiple capture groups with the quantifier * or have I to iterate using QRegularExpressionMatchIterator with a specific regex on the string literals?

This expression might help you to simply capture those attributes and it is not bounded from left and right:
([A-z]+)(=\x22)([A-z]+)(\x22)
Graph
This graph shows how the expression would work and you can visualize other expressions in this link, if you wish to know:
If you would like to add additional boundaries to it, which you might want to do so, you can further extend it, maybe to something similar to:
(?:^<p )?([A-z]+)(=\x22)([A-z]+)(\x22)
Test for RegEx
const regex = /(?:^<p )?([A-z]+)(=\x22)([A-z]+)(\x22)/gm;
const str = `<p attributeA="foo" attributeB="bar" attributeC="baz" attributeD="qux"></p>`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Related

Return only numbers from string using Google Analytics regex

I have a URL, lets say:
google.com/?ZipCode=77007
How can I return only the number part of the URL? I'm using google analytics regex.
I tried something like this:
\d{5}
and it matches the URL but doesn't isolate only the number.
Thanks!
If we wish to just get the zip code, these expressions might likely work:
ZipCode=([0-9]+)
ZipCode=([0-9]{5})
ZipCode=(\d+)
ZipCode=(\d{5})
which all have a missing capturing group (), that I'm guessing to be the issue here.
Demo 1
RegEx Circuit
jex.im visualizes regular expressions:
Demo
const regex = /ZipCode=(\d+)/gm;
const str = `google.com/?ZipCode=77007`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Regular expression with two unique requirements

I would like a regular expression that matches the following string:
"( one , two,three ,four, '')"
and extracts the following:
"one"
"two"
"three"
""
There could be any number of elements. The Regular expression:
"\[a-zA-Z\]+|(?<=')\\s*(?=')"
works, but the library I am using is not compatible with look-around assertions.
Do I have any options?
This expression would likely capture what we might want to extract here:
(\s+)?([A-Za-z]+)(\s+)?|'(.+)?'
which we might not want other additional boundaries and our desired outputs are in these two groups:
([A-Za-z]+)
(.+)
Demo
RegEx Circuit
jex.im visualizes regular expressions:
Test
const regex = /(\s+)?([A-Za-z]+)(\s+)?|'(.+)?'/gm;
const str = `"( one , two,three ,four, '')"`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

RegEx for matching and excluding ' and "

I know that to negate a character like ' I can write [^'].
Bu I want to capture any character (repeated zero or more times) but this character should not be single or double quote:
"[^'""]*"
Is this the right syntax?
This expression might help you to do so:
([^"'])*
You might also want to use:
([^\x22\x27])*
Which you can simplify it as an expression maybe similar to so that to capture everything else that you wish except ' and " in a capturing group:
([^\x27\x22]*)
Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
JavaScript Test
const regex = /([^\x27|\x22])*/gm;
const str = `anything else9*F&(A*&Fa09s7f'"'''"afa'"adfadsf`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Regular expression for match string with new line char

How use regular expression to match in text passphrase between Passphrase= string and \n char (Select: testpasssword)? The password can contain any characters.
My partial solution: Passphrase.*(?=\\nName) => Passphrase=testpasssword
[wifi_d0b5c2bc1d37_7078706c617967726f756e64_managed_psk]\nPassphrase=testpasssword\nName=pxplayground\nSSID=9079706c697967726f759e69\nFrequency=2462\nFavorite=true\nAutoConnect=true\nModified=2018-06-18T09:06:26.425176Z\nIPv4.method=dhcp\nIPv4.DHCP.LastAddress=0.0.0.0\nIPv6.method=auto\nIPv6.privacy=disabled\n
With QRegularExpression that supports PCRE regex syntax, you may use
QString str = "your_string";
QRegularExpression rx(R"(Passphrase=\K.+?(?=\\n))");
qDebug() << rx.match(str).captured(0);
See the regex demo
The R"(Passphrase=\K.+?(?=\\n))" is a raw string literal defining a Passphrase=\K.+?(?=\\n) regex pattern. It matches Passphrase= and then drops the matched text with the match reset operator \K and then matches 1 or more chars, as few as possible, up to the first \ char followed with n letter.
You may use a capturing group approach that looks simpler though:
QRegularExpression rx(R"(Passphrase=(.+?)\\n)");
qDebug() << rx.match(str).captured(1); // Here, grab Group 1 value!
See this regex demo.
The only thing you were missing is the the lazy quantifier telling your regex to only match as much as necessary and a positive lookbehind. The first one being a simple question mark after the plus, the second one just prefacing the phrase you want to match but not include by inputting ?<=. Check the code example to see it in action.
(?<=Passphrase=).+?(?=\\n)
const regex = /(?<=Passphrase=).+?(?=\\n)/gm;
const str = `[wifi_d0b5c2bc1d37_7078706c617967726f756e64_managed_psk]\\nPassphrase=testpasssword\\nName=pxplayground\\nSSID=9079706c697967726f759e69\\nFrequency=2462\\nFavorite=true\\nAutoConnect=true\\nModified=2018-06-18T09:06:26.425176Z\\nIPv4.method=dhcp\\nIPv4.DHCP.LastAddress=0.0.0.0\\nIPv6.method=auto\\nIPv6.privacy=disabled\\n
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Regx to find the string staring with word and ending either with ?/end of line but not containing a specific word

Regx to find the string staring with word and ending either with ?/end of line but not containing a specific word
For e.g., I have following URL with different format and want to capture specific part (Page Identifier )
Home: https://www.example.com/course/home#/
courseSummary: https://www.example.com/tag/mypage/course/#/courseSummary?courseName=abc&courceTitle=MyTitle
grounddetails : https://www.example.com/tag/mypage/course/#/options/grounddetails
Certification : https://www.example.com/tag/mypage/course/#/options/Certification/segment
customer: https://www.example.com/tag/mypage/course/#/checkout/customer
But whenever the 'confirmation' word contain in URL then it SHOULD NOT match.
https://www.example.com/tag/mypage/course/#/**confirmation**?success=true
Could you please help to compose the regex for it
You may try this:
^\w+ *: *http(?:s):\/\/(?!.*confirmation).*(?:\?|\n)$
Regex 101 Demo
const regex = /^\w+ *: *http(?:s):\/\/(?!.*confirmation).*(?:\?|\n)$/gm;
const str = `Home: https://www.example.com/course/home#/
courseSummary: https://www.example.com/tag/mypage/course/#/courseSummary?courseName=abc&courceTitle=MyTitle
grounddetails : https://www.example.com/tag/mypage/course/#/options/grounddetails
Certification : https://www.example.com/tag/mypage/course/#/options/Certification/segment
customer: https://www.example.com/tag/mypage/course/#/checkout/customer
But whenever the 'confirmation' word contain in URL then it SHOULD NOT match.
blalba: https://www.example.com/tag/mypage/course/#/**confirmat**?success=true
blalba: https://www.example.com/tag/mypage/course/#/**confirmation**?success=true
blalba: https://www.example.com/tag/mypage/course/#/**confirmatio**?success=true
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}