Regex match all except first instance - c++

I am struggling to find a working regex pattern to match all instances of a string except the first.
I am using std::regex_replace to add a new line before each instance of a substring with a new line. however none of the googling I have found so far has produced a working regex pattern for this.
outputString = std::regex_replace(outputString, std::regex("// ~"), "\n// ~");
So all but the first instance of // ~ should be changed to \n// ~

In this case, I'd advise being the anti-Nike: "Just don't do it!"
The pattern you're searching for is trivial. So is the replacement. There's simply no need to use a regex at all. Under the circumstances, I'd just use std::string::find in a loop, skipping replacement of the first instance you find.
std::string s = "a = b; // ~ first line // ~ second line // ~ third line";
std::string pat = "// ~";
std::string rep = "\n// ~";
auto pos = s.find(pat); // first one, which we skip
while ((pos=s.find(pat, pos+pat.size())) != std::string::npos) {
s.replace(pos, pat.size(), rep);
pos += rep.size() - pat.size();
}
std::cout << s << "\n";
When you're doing a replacement like this one, where the replacement string includes a copy of the string you're searching for, you have to be a little careful about where you start second (and subsequent) searches, to assure that you don't repeatedly find the same string you just replaced. That's why we keep track of the current position, and each subsequent search we make, we move the starting point far enough along that we won't find the same instance of the pattern again and again.

Related

Quick regex_search/replace, or clear indication of replacement?

I must browse a collection of strings to replace a pattern and save the changes.
The saving operation is (very) expensive and out of my hands, so I would like to know beforehand if the replacement did anything.
I can use std::regex_search to gain knowledge on the pattern's presence in my input, and use capture groups to store details in a std::smatch. std::regex_replace does not seem to explicitely tell me wether it did anything.
The patterns and strings are arbitrarily long and complicated; running regex_replace after a regex_search seems wasteful.
I can directly compare the input and output to search for a discrepancy but that too is uncomfortable.
Is there either a simple way to observe regex_replace to determine its impact, or to use a smatch filled by the regex_search to do a faster replacement operation ?
Thanks in advance.
No regex_replace doesn't provide this info and yes you can do it with a regex_search loop.
For example like this:
std::regex pattern("...");
std::string replacement_format = "...";
std::string input = "......"; // a very, very long string
std::string output, replacement;
std::smatch match;
auto begin = input.cbegin();
int replacements = 0;
while (std::regex_search(begin, input.cend(), match, pattern)) {
output += match.prefix();
replacement = match.format(replacement_format);
if (match[0] != replacement) {
replacements++;
}
output += replacement;
begin = match.suffix().first;
}
output.append(begin, input.cend());
if (replacements > 0) {
// process output ...
}
Live demo
As regex_replace creates a copy of your string you could simply compare the replaced string with the original one and only "store" the new one if they differ.
For C++14 it seems that regex_replace returns a pointer to the last place it has written to:
https://www.cplusplus.com/reference/regex/regex_replace/ Versions 5
and 6 return an iterator that points to the element past the last
character written to the sequence pointed by out.

Remove spaces from string before period and comma

I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.
A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo
On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.
I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo
First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.

Find a special part in a string

I try to locate a special part in a string.
The example of string as follow:
22.21594087,1.688530832,0
I want to locate 1.688530832 out.
I tried
temp.substr(temp.find(",")+1,temp.rfind(","));
and got 1.688530832,0.
I replaced rfind() with find_last_of() but still got the same result.
temp.substr(temp.find(",")+1,temp.find_last_of(","));
I know this is a simple problem and there are other solutions.But I just want to know why the rfind did not work.
Thank you very much!
The second argument for substr is not the ending index, but rather the length of the substring you want. Simply throw in the length of 1.688530832 and you'll be fine.
If the length of the search string is not available, then you can find the position of the last comma and subtract that from the position of the first character of the special word:
auto beginning_index = temp.find(",") + 1;
auto last_comma_index = temp.rfind(",");
temp.substr(beginning_index, last_comma_index - beginning_index);
I see what you are doing. You are trying to have kind of iterators to the beginning and the end of a substring. Unfortunately, substr does not work that way, and instead expects an index and an offset from that index to select the substring.
What you were trying to achieve can be done with std::find, which does work with iterators:
auto a = std::next(std::find(begin(temp), end(temp), ','));
auto b = std::next(std::find(rbegin(temp), rend(temp), ',')).base();
std::cout << std::string(a, b);
Live demo

Qt Regex Help (Array Keys)

Okay, so the following string is what my regex will attempt to match against:
[key1][key2][key3]
and here is my regex.
\[(.+?)\]
This is all being done in Qt, and here is the code I am using
QRegExp reg("\\[(.+?)\\]");
reg.indexIn(string);
qDebug() << "Matches: " << reg.capturedTexts();
The above returns this:
("", "")
So two questions then:
Why are the captures empty
On my regex, why did I need to put \\ for it to work? If I just put \ it will not capture anything.
Thank you!
First, let's optimize your regular expression: instead of .+? reluctant expression use [^\]]+, which lets you avoid so-called catastrophic backtracking. The new expression is as follows:
\\[([^\\]]+)\\]
On my regex, why did I need to put \\ for it to work?
Because the regex goes through two compilers which pay attention to backslashes - first, your C++ compiler, and then the regex compiler inside QRegExp constructor. The first slash of the pair is for the C++ compiler; the second one is for the regex compiler. Once C++ compiler is finished, each pair of backslahses is replaced with a single slash, which is what the regex needs.
I got key1, but now how do I get the other 2? reg.capturedCount() returns 1
Your regular expression captures one square bracket - delimited item at a time. If you want to capture them all, you need a loop:
int pos = 0;
while (pos >= 0) {
pos = reg.indexIn(str, pos);
if (pos >= 0) {
++pos; // move along in str
qDebug() << "Matches: " << reg.capturedTexts();
}
}

C++ boost::regex multiples captures

I'm trying to recover multiples substrings thanks to boost::regex and put each one in a var. Here my code :
unsigned int i = 0;
std::string string = "--perspective=45.0,1.33,0.1,1000";
std::string::const_iterator start = string.begin();
std::string::const_iterator end = string.end();
std::vector<std::string> matches;
boost::smatch what;
boost::regex const ex(R"(^-?\d*\.?\d+),(^-?\d*\.?\d+),(^-?\d*\.?\d+),(^-?\d*\.?\d+))");
string.resize(4);
while (boost::regex_search(start, end, what, ex)
{
std::string stest(what[1].first, what[1].second);
matches[i] = stest;
start = what[0].second;
++i;
}
I'm trying to extract each float of my string and put it in my vector variable matches. My result, at the moment, is that I can extract the first one (in my vector var, I can see "45" without double quotes) but the second one in my vector var is empty (matches[1] is "").
I can't figure out why and how to correct this. So my question is how to correct this ? Is my regex not correct ? My smatch incorrect ?
Firstly, ^ is symbol for the beginning of a line. Secondly, \ must be escaped. So you should fix each (^-?\d*\.?\d+) group to (-?\\d*\\.\\d+). (Probably, (-?\\d+(?:\\.\\d+)?) is better.)
Your regular expression searches for the number,number,number,number pattern, not for the each number. You add only the first substring to matches and ignore others. To fix this, you can replace your expression with (-?\\d*\\.\\d+) or just add all the matches stored in what to your matches vector:
while (boost::regex_search(start, end, what, ex))
{
for(int j = 1; j < what.size(); ++j)
{
std::string stest(what[j].first, what[j].second);
matches.push_back(stest);
}
start = what[0].second;
}
You are using ^ at several times in your regex. That's why it didn't match. ^ means the beginning of the string. Also you have an extra ) at the end of the regex. I don't know that closing bracket doing there.
Here is your regex after correction:
(-?\d*\.?\d+),(-?\d*\.?\d+),(-?\d*\.?\d+),(-?\d*\.?\d+)
A better version of your regex can be(only if you want to avoid matching numbers like .01, .1):
(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?)
A repeated search in combination with a regular expression that apparently is built to match all of the target string is pointless.
If you are searching repeatedly in a moving window delimited by a moving iterator and string.end() then you should reduce the pattern to something that matches a single fraction.
If you know that the number of fractions in your string is/must be constant, match once, not in a loop and extract the matched substrings from what.