Save all regex matches on a vector - c++

So, I need to create a function that gets all occurrence matches on one string based on a regex, then store them in an array to ultimately choose an arbitrary capture group number within an individual match. I tried this:
std::string match(std::string basestring, std::string regex, int index, int group) {
std::vector<std::smatch> match;
(here I would need to create a while statement that iterates over all matches, but I'm not sure what overload of 'regex_search' I have to use)
return match.at(index)[group]; }
I thought of getting a match and then starting to search just next to the end position of that match, in order to get the next one, when no match was found we assume that there are no more matches, and so the while statement is over, then the index and group arguments would get the desired capture group within a match. But I can't seem to find a 'regex_search' overload that requires a starting (or starting and end) positions as well as requiring the target string.

I found the solution myself after some hours of digging, this code will do the job:
std::string match(std::string s, std::string r, int index = 0, int group = 0) {
std::vector<std::smatch> match;
std::regex rx(r);
auto matches_begin = std::sregex_iterator(s.begin(), s.end(), rx);
auto matches_end = std::sregex_iterator();
for (std::sregex_iterator i = matches_begin; i != matches_end; ++i) { match.push_back(*i); }
return match.at(index)[group]; }

Related

How to find a character in a string?

I have already checked topics similar to this one but no one has been able to solve this problem.
So, I have to look for a character inside a string but it doesn't seem to work.
if (tracciatonuovos.find('T'))
{
nterminale++;
}
The counter does not increase. But if I try to find an empty space, it counts for me, and yet the string is full
First value is string, second is length of string, and third is the value of counter "nterminale".
use the find function from the std::string class
std::string mystr = "Some String with T";
size_t apos = mystr.find("T");
Read more about it here
If you want to find the first occurrence use :
find_first_of
And if you want to repeatedly find all occurrences of a specific character you will also need to specify a search start position and will need to write a loop say something like :
size_t pos = 0;
while((pos = mystr.find(whatever, pos)) != std::string::npos)
{
pos +=1;
// and your other logic here
}

Replace substring within a string c++

I want to replace substring within a string,
For eg: the string is aa0_aa1_bb3_c*a0_a,
so I want to replace the substring a0_a with b1_a, but I dont want aa0_a to get replaced.
Basically, no alphabet should be present before and after the substring "a0_a" (to be replaced).
That's what regexes are good at. It exists in standard library since C++11, if you have an older version, you can also use Boost.
With the standard library version, you could do (ref):
std::string result;
std::regex rx("([^A-Za-Z])a0_a[^A-Za-Z])");
result = std::regex_replace("aa0_aa1_bb3_c*a0_a", rx, "$1b1_a$2");
(beware: untested)
Easy enough to do if you loop through each character. Some pseudocode:
string toReplace = "a0_a";
for (int i = 0; i < myString.length; i++) {
//filter out strings starting with another alphabetical char
if (!isAlphabet(myString.charAt(i))) {
//start the substring one char after the char we have verified to be not alphabetical
if (substring(myString(i + 1, toReplace.length)).equals(toReplace)) {
//make the replacement here
}
}
}
Note that you will need to check for indexing out of bounds when looking at the substrings.

Find index of first match using C++ regex

I'm trying to write a split function in C++ using regexes. So far I've come up with this;
vector<string> split(string s, regex r)
{
vector<string> splits;
while (regex_search(s, r))
{
int split_on = // index of regex match
splits.push_back(s.substr(0, split_on));
s = s.substr(split_on + 1);
}
splits.push_back(s);
return splits;
}
What I want to know is how to fill in the commented line.
You'll need just a little more than that, but see the comments in the code below. The man trick is to use a match object, here std::smatch because you're matching on a std::string, to remember where you matched (not just that you did):
vector<string> split(string s, regex r)
{
vector<string> splits;
smatch m; // <-- need a match object
while (regex_search(s, m, r)) // <-- use it here to get the match
{
int split_on = m.position(); // <-- use the match position
splits.push_back(s.substr(0, split_on));
s = s.substr(split_on + m.length()); // <-- also, skip the whole match
}
if(!s.empty()) {
splits.push_back(s); // and there may be one last token at the end
}
return splits;
}
This can be used like so:
auto v = split("foo1bar2baz345qux", std::regex("[0-9]+"));
and will give you "foo", "bar", "baz", "qux".
std::smatch is a specialization of std::match_results, for which reference documentation exists here.

C++11 regex replace

I have an XML string that i wish to log out. this XML contains some sensitive data that i'd like to mask out before sending to the log file. Currently using std::regex to do this:
std::regex reg("<SensitiveData>(\\d*)</SensitiveData>");
return std::regex_replace(xml, reg, "<SensitiveData>......</SensitiveData>");
Currently the data is being replaced by exactly 6 '.' characters, however what i really want to do is to replace the sensitive data with the correct number of dots. I.e. I'd like to get the length of the capture group and put that exact number of dots down.
Can this be done?
regex_replace of C++11 regular expressions does not have the capability you are asking for — the replacement format argument must be a string. Some regular expression APIs allow replacement to be a function that receives a match, and which could perform exactly the substitution you need.
But regexps are not the only way to solve a problem, and in C++ it's not exactly hard to look for two fixed strings and replace characters inbetween:
const char* const PREFIX = "<SensitiveData>";
const char* const SUFFIX = "</SensitiveData>";
void replace_sensitive(std::string& xml) {
size_t start = 0;
while (true) {
size_t pref, suff;
if ((pref = xml.find(PREFIX, start)) == std::string::npos)
break;
if ((suff = xml.find(SUFFIX, pref + strlen(PREFIX))) == std::string::npos)
break;
// replace stuff between prefix and suffix with '.'
for (size_t i = pref + strlen(PREFIX); i < suff; i++)
xml[i] = '.';
start = suff + strlen(SUFFIX);
}
}

Instead of having different size_t variables, can I use just one for searching a std::string multiple times?

I am wondering if it is possible to cut down how many size_t variables I use here. Here is what I have:
std::size_t found, found2, found3, found4 /* etc */;
Each has its own string to find:
found1 = msg.find("string1");
found2 = msg.find("string2");
found3 = msg.find("string3");
found4 = msg.find("string4");
// etc
If the word is found, then it will discard and prevent the message to be shown:
if (found1 != std::string::npos)
{
SendMsg("You cannot say that word!");
}
I have else if statements until found21. I'd like to cut everything down in size, so it would be clean, but I don't have a clue how to do it. I would also like it to lowercase the word. I have never used tolower at all either, so I would appreciated it if someone would help me.
To lowercase a string, you can do
std::transform(msg.begin(), msg.end(), msg.begin(), std::tolower);
Transform takes a begin and end iterator as the first and second arguments, and for each element in that range, applies the fourth argument (a function) and assigns it to what the third iterator is pointing to and increments it. By passing msg.begin() as both the first and third arguments, it will assign the result of the function to what it passed to it. So transform will basically do this:
for (auto src = begin(msg), dst = begin(msg); src != end(msg); ++src, ++dst)
*dst = tolower(*src);
but using transform is so much nicer.
To check whether a string contains any of a list of substrings, you can use a for loop with a vector:
vector<string> bad_strings { "bad word 1", "bad word 2", "etc" };
for (auto i = begin(bad_strings); i != end(bad_strings); ++i)
if (msg.find(*i)) {
SendMsg("You cannot say that word!");
break; // stop when you find it matches even one bad string
}