Qt Regex Help (Array Keys) - c++

Okay, so the following string is what my regex will attempt to match against:
[key1][key2][key3]
and here is my regex.
\[(.+?)\]
This is all being done in Qt, and here is the code I am using
QRegExp reg("\\[(.+?)\\]");
reg.indexIn(string);
qDebug() << "Matches: " << reg.capturedTexts();
The above returns this:
("", "")
So two questions then:
Why are the captures empty
On my regex, why did I need to put \\ for it to work? If I just put \ it will not capture anything.
Thank you!

First, let's optimize your regular expression: instead of .+? reluctant expression use [^\]]+, which lets you avoid so-called catastrophic backtracking. The new expression is as follows:
\\[([^\\]]+)\\]
On my regex, why did I need to put \\ for it to work?
Because the regex goes through two compilers which pay attention to backslashes - first, your C++ compiler, and then the regex compiler inside QRegExp constructor. The first slash of the pair is for the C++ compiler; the second one is for the regex compiler. Once C++ compiler is finished, each pair of backslahses is replaced with a single slash, which is what the regex needs.
I got key1, but now how do I get the other 2? reg.capturedCount() returns 1
Your regular expression captures one square bracket - delimited item at a time. If you want to capture them all, you need a loop:
int pos = 0;
while (pos >= 0) {
pos = reg.indexIn(str, pos);
if (pos >= 0) {
++pos; // move along in str
qDebug() << "Matches: " << reg.capturedTexts();
}
}

Related

Regex match all except first instance

I am struggling to find a working regex pattern to match all instances of a string except the first.
I am using std::regex_replace to add a new line before each instance of a substring with a new line. however none of the googling I have found so far has produced a working regex pattern for this.
outputString = std::regex_replace(outputString, std::regex("// ~"), "\n// ~");
So all but the first instance of // ~ should be changed to \n// ~
In this case, I'd advise being the anti-Nike: "Just don't do it!"
The pattern you're searching for is trivial. So is the replacement. There's simply no need to use a regex at all. Under the circumstances, I'd just use std::string::find in a loop, skipping replacement of the first instance you find.
std::string s = "a = b; // ~ first line // ~ second line // ~ third line";
std::string pat = "// ~";
std::string rep = "\n// ~";
auto pos = s.find(pat); // first one, which we skip
while ((pos=s.find(pat, pos+pat.size())) != std::string::npos) {
s.replace(pos, pat.size(), rep);
pos += rep.size() - pat.size();
}
std::cout << s << "\n";
When you're doing a replacement like this one, where the replacement string includes a copy of the string you're searching for, you have to be a little careful about where you start second (and subsequent) searches, to assure that you don't repeatedly find the same string you just replaced. That's why we keep track of the current position, and each subsequent search we make, we move the starting point far enough along that we won't find the same instance of the pattern again and again.

Regex Replace everything except between the first " and the last "

i need a regex that replaces everything except the content between the first " and the last ".
I need it like this:
Input String:["Key:"Value""]
And after the regex i only need this:
Output String:Key:"Value"
Thanks!
You can try something like this.
patern:
^.*?"(.*)".*$
Substion:
$1
On Regex101
Explination:
the first part ^.*?" matches as few characters as possible that are between the start of the string and a double quote
the second part(.*)" makes the largest match it can that ends in a double quote, and stuffs it all in a capture group
the last part .*$ grabs what ever is left and includes it in the match
Finally you replace the entire match with the contents of the first capture group
Can you say why you need a RegExp?
A function like:
String unquote(String input) {
int start = input.indexOf('"');
if (start < 0) return input; // or throw.
int end = input.lastIndexOf('"');
if (start == end) return input; // or throw
return input.substring(start + 1, end);
}
is going to be faster and easier to understand than a RegExp.
Anyway, for the challenge, let's say we do want a RegExp that replaces the part up to the first " and from the last " with nothing. That's two replaces, so you can do an
input.replaceAll(RegExp(r'^[^"]*"|"[^"]*$'), "")`
or you can use a capturing group and a computed replacement like:
input.replaceFirstMapped(RegExp(r'^[^"]*"([^]*)"[^"]*$'), (m) => m[1])
Alternatively, you can use the capturing group to select the text between the two and extract it in code, instead of doing string replacement:
String unquote(String input) {
var re = RegExp(r'^[^"]*"([^]*)"[^"]$');
var match = re.firstMatch(input);
if (match == null) return input; // or throw.
return match[1];
}

Remove spaces from string before period and comma

I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.
A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo
On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.
I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo
First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.

Find and replace with regular expressions

I'm trying to replace a bunch of function calls using regular expressions but can't seem to be getting it right. This is a simplified example of what I'm trying to do:
GetPetDog();
GetPetCat();
GetPetBird();
I want to change to:
GetPet<Animal_Dog>();
GetPet<Animal_Cat>();
GetPet<Animal_Bird>();
Use below regex:
(GetPet)([^(]*) with subsitution \1<Animal_\2>
Demo
You can use the following regex and code for that:
std::string ss ("GetPetDog();");
static const std::regex ee ("GetPet([^()]*)");
std::string result;
result = regex_replace(ss, ee, "GetPet<Animal_$1>");
std::cout << result << endl;
Regex:
GetPet - Matches GetPet literally (we need no capturing group here)
([^()]*) - A capturing group to match any characters other than ( or ) 0 or more times (*)
Output:

Why am I getting multiple regex matches?

I'm trying to write a processor for GLSL shader code that will allow me to analyze the code and dynamically determine what inputs and outputs I need to handle for each shader.
To accomplish that, I decided to use some regex to parse the shader code before I compile it via OpenGL.
I've written some test code to verify that the regex is working as I expect.
Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main()
{
string strInput = " in vec3 i_vPosition; ";
smatch match;
// Will appear in regex as:
// \bin\s+[a-zA-Z0-9]+\s+[a-zA-Z0-9_]+\s*(\[[0-9]+\])?\s*;
regex rgx("\\bin\\s+[a-zA-Z0-9]+\\s+[a-zA-Z0-9_]+\\s*(\\[[0-9]+\\])?\\s*;");
bool bMatchFound = regex_search(strInput, match, rgx);
cout << "Match found: " << bMatchFound << endl;
for (int i = 0; i < match.size(); ++i)
{
cout << "match " << i << " (" << match[i] << ") ";
cout << "at position " << match.position(i) << std::endl;
}
}
The only problem is that the above code generates two results instead of one. Though one of the results is empty.
Output:
Match found: 1
match 0 (in vec3 i_vPosition;) at position 6
match 1 () at position 34
I ultimately want to generate multiple results when I provide a whole file as input, but I'd like to get some consistency so that I can process the results in a consistent manner.
Any ideas as to why I'm getting multiple results when I'm only expecting one?
Your regex appears to contain a back reference
(\[[0-9]+\])?
which would contain square brackets surrounding 1 or more digits, but the ? makes it optional.
When applying the regex, the leading and trailing spaces are trimmed by the
\s+ ... \s*
The remainder of the string is matched by
[a-zA-Z0-9]+\s+[a-zA-Z0-9_]+\s*
And the backreference bit matches the empty string.
If you want to match strings that optionally contain that bit, but not return it as a backreference, make it passive with ?: like:
\bin\s+[a-zA-Z0-9]+\s+[a-zA-Z0-9_]+\s*(?:\[[0-9]+\])?\s*
I ultimately want to generate multiple results
The regex_search only finds the first match of the complete regular expression.
If you want to find the other places in your source text that the complete regular expression matches,
you must run regex_search repeatedly.
See
" C++ Regex to match words without punctuation "
for an example of repeatedly running the search.
the above code generates two results instead of one.
Confusing, isn't it?
The regular expression
\bin\s+[a-zA-Z0-9]+\s+[a-zA-Z0-9_]+\s*(\[[0-9]+\])?\s*;
includes round brackets().
The round brackets create a "group" aka "sub-expression".
Because the sub-expression is optional "(....)?",
the expression as a whole is allowed to match even if the sub-expression doesn't really match anything.
When the sub-expression doesn't match anything, the value of that sub-expression is an empty string.
See "Regular-expressions: Use Round Brackets for Grouping" for far more information on "capturing parenthesis" and "non-capturing parenthesis".
According to the documentation for regex_search,
match.size() is the number of subexpressions plus 1,
match[0] is the part of the source string that matches the complete regular expression.
match[1] is the part of the source string that matches the first sub-expression inside the regular expression.
match[n] is the part of the source string that matches the n'th sub-expression inside the regular expression.
A regular expression with only 1 sub-expression, as in the above example, will always return a match.size() of 2 -- one match for the complete regular expression, and one match for the sub-expression -- even when that sub-expression doesn't really match anything and is therefore the empty string.