Find index of first match using C++ regex - c++

I'm trying to write a split function in C++ using regexes. So far I've come up with this;
vector<string> split(string s, regex r)
{
vector<string> splits;
while (regex_search(s, r))
{
int split_on = // index of regex match
splits.push_back(s.substr(0, split_on));
s = s.substr(split_on + 1);
}
splits.push_back(s);
return splits;
}
What I want to know is how to fill in the commented line.

You'll need just a little more than that, but see the comments in the code below. The man trick is to use a match object, here std::smatch because you're matching on a std::string, to remember where you matched (not just that you did):
vector<string> split(string s, regex r)
{
vector<string> splits;
smatch m; // <-- need a match object
while (regex_search(s, m, r)) // <-- use it here to get the match
{
int split_on = m.position(); // <-- use the match position
splits.push_back(s.substr(0, split_on));
s = s.substr(split_on + m.length()); // <-- also, skip the whole match
}
if(!s.empty()) {
splits.push_back(s); // and there may be one last token at the end
}
return splits;
}
This can be used like so:
auto v = split("foo1bar2baz345qux", std::regex("[0-9]+"));
and will give you "foo", "bar", "baz", "qux".
std::smatch is a specialization of std::match_results, for which reference documentation exists here.

Related

How can I use the C++ regex library to find a match and *then* replace it?

I am writing what amounts to a tiny DSL in which each script is read from a single string, like this:
"func1;func2;func1;4*func3;func1"
I need to expand the loops, so that the expanded script is:
"func1;func2;func1;func3;func3;func3;func3;func1"
I have used the C++ standard regex library with the following regex to find those loops:
regex REGEX_SIMPLE_LOOP(":?[0-9]+)\\*([_a-zA-Z][_a-zA-Z0-9]*;");
smatch match;
bool found = std::regex_search(*this, match, std::regex(REGEX_SIMPLE_LOOP));
Now, it's not too difficult to read out the loop multiplier and print the function N times, but how do I then replace the original match with this string? I want to do this:
if (found) match[0].replace(new_string);
But I don't see that the library can do this.
My backup place is to regex_search, then construct the new string, and then use regex_replace, but it seems clunky and inefficient and not nice to essentially do two full searches like that. Is there a cleaner way?
You can also NOT use regex, the parsing isn't too difficult.
So regex might be overkill. Demo here : https://onlinegdb.com/RXLqLtrUQ-
(and yes my output gives an extra ; at the end)
#include <string>
#include <sstream>
#include <iostream>
int main()
{
std::istringstream is{ "func1;func2;func1;4*func3;func1" };
std::string split;
// use getline to split
while (std::getline(is, split, ';'))
{
// assume 1 repeat
std::size_t count = 1;
// if split part starts with a digit
if (std::isdigit(split.front()))
{
// look for a *
auto pos = split.find('*');
// the first part of the string contains the repeat count
auto count_str = split.substr(0, pos);
// convert that to a value
count = std::stoi(count_str);
// and keep the rest ("funcn")
split = split.substr(pos + 1, split.size() - pos - 1);
}
// now use the repeat count to build the output string
for (std::size_t n = 0; n < count; ++n)
{
std::cout << split << ";";
}
}
// TODO invalid input string handling.
return 0;
}

C++ regex replace whole word

I have a small game to do in which I need to sometimes replace some group of characters with the name of the player in the sentences.
For example, I could have a sentence like :
"[Player]! Are you okay? A plane crash happened, it's on fire!"
And I need to replace the "[Player]" with some name contained in a std::string.
I have been looking for about 20 minutes in other SO questions and in the CPP reference and I really can't understand how to use the regex.
I would like to know how I can replace all instances of the "[Player]" string in a std::string.
Personally I would not use regex for this. A simple search and replace should be enough.
These are (roughly) the functions I use:
// change the string in-place
std::string& replace_all_mute(std::string& s,
const std::string& from, const std::string& to)
{
if(!from.empty())
for(std::size_t pos = 0; (pos = s.find(from, pos) + 1); pos += to.size())
s.replace(--pos, from.size(), to);
return s;
}
// return a copy of the string
std::string replace_all_copy(std::string s,
const std::string& from, const std::string& to)
{
return replace_all_mute(s, from, to);
}
int main()
{
std::string s = "[Player]! Are you okay? A plane crash happened, it's on fire!";
replace_all_mute(s, "[Player]", "Uncle Bob");
std::cout << s << '\n';
}
Output:
Uncle Bob! Are you okay? A plane crash happened, it's on fire!
Regex is meant for more complex patterns. Consider, for example, that instead of simply matching [Player], you wanted to match anything between brackets. That would be a good use for regex.
Following is an example that does just that. Unfortunately, the interface of <regex> is not flexible enough to enable dynamic replacements, so we have to implement the actual replacing ourselves.
#include <iostream>
#include <regex>
int main() {
// Anything stored here can be replaced in the string.
std::map<std::string, std::string> vars {
{"Player1", "Bill"},
{"Player2", "Ted"}
};
// Matches anything between brackets.
std::regex r(R"(\[([^\]]+?)\])");
std::string str = "[Player1], [Player1]! Are you okay? [Player2] said that a plane crash happened!";
// We need to keep track of where we are, or else we would need to search from the start of
// the string everytime, which is very wasteful.
// std::regex_iterator won't help, because the replacement may be smaller
// than the match, and it would cause strings like "[Player1][Player1]" to not match properly.
auto pos=str.cbegin();
do {
// First, we try to get a match. If there's no more matches, exit.
std::smatch m;
regex_search(pos, str.cend(), m, r);
if (m.empty()) break;
// The interface of std::match_results is terrible. Let's get what we need and
// place it in apropriately named variables.
auto var_name = m[1].str();
auto start = m[0].first;
auto end = m[0].second;
auto value = vars[var_name];
// This does the actual replacement
str.replace(start, end, value);
// We update our position. The new search will start right at the end of the replacement.
pos = m[0].first + value.size();
} while(true);
std::cout << str;
}
Output:
Bill, Bill! Are you okay? Ted said that a plane crash happened!
See it live on Coliru
Simply find and replace, e.g. boost::replace_all()
#include <boost/algorithm/string.hpp>
std::string target(""[Player]! Are you okay? A plane crash happened, it's on fire!"");
boost::replace_all(target, "[Player]", "NiNite");
As some people have mentioned, find and replace might be more useful for this scenario, you could do something like this.
std::string name = "Bill";
std::string strToFind = "[Player]";
std::string str = "[Player]! Are you okay? A plane crash happened, it's on fire!";
str.replace(str.find(strToFind), strToFind.length(), name);

Save all regex matches on a vector

So, I need to create a function that gets all occurrence matches on one string based on a regex, then store them in an array to ultimately choose an arbitrary capture group number within an individual match. I tried this:
std::string match(std::string basestring, std::string regex, int index, int group) {
std::vector<std::smatch> match;
(here I would need to create a while statement that iterates over all matches, but I'm not sure what overload of 'regex_search' I have to use)
return match.at(index)[group]; }
I thought of getting a match and then starting to search just next to the end position of that match, in order to get the next one, when no match was found we assume that there are no more matches, and so the while statement is over, then the index and group arguments would get the desired capture group within a match. But I can't seem to find a 'regex_search' overload that requires a starting (or starting and end) positions as well as requiring the target string.
I found the solution myself after some hours of digging, this code will do the job:
std::string match(std::string s, std::string r, int index = 0, int group = 0) {
std::vector<std::smatch> match;
std::regex rx(r);
auto matches_begin = std::sregex_iterator(s.begin(), s.end(), rx);
auto matches_end = std::sregex_iterator();
for (std::sregex_iterator i = matches_begin; i != matches_end; ++i) { match.push_back(*i); }
return match.at(index)[group]; }

C++ Get String between two delimiter String

Is there any inbuilt function available two get string between two delimiter string in C/C++?
My input look like
_STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_
And my output should be
_0_192.168.1.18_
Thanks in advance...
You can do as:
string str = "STARTDELIMITER_0_192.168.1.18_STOPDELIMITER";
unsigned first = str.find(STARTDELIMITER);
unsigned last = str.find(STOPDELIMITER);
string strNew = str.substr (first,last-first);
Considering your STOPDELIMITER delimiter will occur only once at the end.
EDIT:
As delimiter can occur multiple times, change your statement for finding STOPDELIMITER to:
unsigned last = str.find_last_of(STOPDELIMITER);
This will get you text between the first STARTDELIMITER and LAST STOPDELIMITER despite of them being repeated multiple times.
I have no idea how the top answer received so many votes that it did when the question clearly asks how to get a string between two delimiter strings, and not a pair of characters.
If you would like to do so you need to account for the length of the string delimiter, since it will not be just a single character.
Case 1: Both delimiters are unique:
Given a string _STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_ that you want to extract _0_192.168.1.18_ from, you could modify the top answer like so to get the desired effect. This is the simplest solution without introducing extra dependencies (e.g Boost):
#include <iostream>
#include <string>
std::string get_str_between_two_str(const std::string &s,
const std::string &start_delim,
const std::string &stop_delim)
{
unsigned first_delim_pos = s.find(start_delim);
unsigned end_pos_of_first_delim = first_delim_pos + start_delim.length();
unsigned last_delim_pos = s.find(stop_delim);
return s.substr(end_pos_of_first_delim,
last_delim_pos - end_pos_of_first_delim);
}
int main() {
// Want to extract _0_192.168.1.18_
std::string s = "_STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_";
std::string s2 = "ABC123_STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_XYZ345";
std::string start_delim = "_STARTDELIMITER";
std::string stop_delim = "STOPDELIMITER_";
std::cout << get_str_between_two_str(s, start_delim, stop_delim) << std::endl;
std::cout << get_str_between_two_str(s2, start_delim, stop_delim) << std::endl;
return 0;
}
Will print _0_192.168.1.18_ twice.
It is necessary to add the position of the first delimiter in the second argument to std::string::substr as last - (first + start_delim.length()) to ensure that the it would still extract the desired inner string correctly in the event that the start delimiter is not located at the very beginning of the string, as demonstrated in the second case above.
See the demo.
Case 2: Unique first delimiter, non-unique second delimiter:
Say you want to get a string between a unique delimiter and the first non unique delimiter encountered after the first delimiter. You could modify the above function get_str_between_two_str to use find_first_of instead to get the desired effect:
std::string get_str_between_two_str(const std::string &s,
const std::string &start_delim,
const std::string &stop_delim)
{
unsigned first_delim_pos = s.find(start_delim);
unsigned end_pos_of_first_delim = first_delim_pos + start_delim.length();
unsigned last_delim_pos = s.find_first_of(stop_delim, end_pos_of_first_delim);
return s.substr(end_pos_of_first_delim,
last_delim_pos - end_pos_of_first_delim);
}
If instead you want to capture any characters in between the first unique delimiter and the last encountered second delimiter, like what the asker commented above, use find_last_of instead.
Case 3: Non-unique first delimiter, unique second delimiter:
Very similar to case 2, just reverse the logic between the first delimiter and second delimiter.
Case 4: Both delimiters are not unique:
Again, very similar to case 2, make a container to capture all strings between any of the two delimiters. Loop through the string and update the first delimiter's position to be equal to the second delimiter's position when it is encountered and add the string in between to the container. Repeat until std::string:npos is reached.
To get a string between 2 delimiter strings without white spaces.
string str = "STARTDELIMITER_0_192.168.1.18_STOPDELIMITER";
string startDEL = "STARTDELIMITER";
// this is really only needed for the first delimiter
string stopDEL = "STOPDELIMITER";
unsigned firstLim = str.find(startDEL);
unsigned lastLim = str.find(stopDEL);
string strNew = str.substr (firstLim,lastLim);
//This won't exclude the first delimiter because there is no whitespace
strNew = strNew.substr(firstLim + startDEL.size())
// this will start your substring after the delimiter
I tried combining the two substring functions but it started printing the STOPDELIMITER
Hope that helps
Hope you won't mind I'm answering by another question :)
I would use boost::split or boost::split_iter.
http://www.boost.org/doc/libs/1_54_0/doc/html/string_algo/usage.html#idp166856528
For example code see this SO question:
How to avoid empty tokens when splitting with boost::iter_split?
Let's say you need to get 5th argument (brand) from output below:
zoneid:zonename:state:zonepath:uuid:brand:ip-type:r/w:file-mac-profile
You cannot use any "str.find" function, because it is in the middle, but you can use 'strtok'. e.g.
char *brand;
brand = strtok( line, ":" );
for (int i=0;i<4;i++) {
brand = strtok( NULL, ":" );
}
This is a late answer, but this might work too:
string strgOrg= "STARTDELIMITER_0_192.168.1.18_STOPDELIMITER";
string strg= strgOrg;
strg.replace(strg.find("STARTDELIMITER"), 14, "");
strg.replace(strg.find("STOPDELIMITER"), 13, "");
Hope it works for others.
void getBtwString(std::string oStr, std::string sStr1, std::string sStr2, std::string &rStr)
{
int start = oStr.find(sStr1);
if (start >= 0)
{
string tstr = oStr.substr(start + sStr1.length());
int stop = tstr.find(sStr2);
if (stop >1)
rStr = oStr.substr(start + sStr1.length(), stop);
else
rStr ="error";
}
else
rStr = "error"; }
or if you are using Windows and have access to c++14, the following,
void getBtwString(std::string oStr, std::string sStr1, std::string sStr2, std::string &rStr)
{
using namespace std::literals::string_literals;
auto start = sStr1;
auto end = sStr2;
std::regex base_regex(start + "(.*)" + end);
auto example = oStr;
std::smatch base_match;
std::string matched;
if (std::regex_search(example, base_match, base_regex)) {
if (base_match.size() == 2) {
matched = base_match[1].str();
}
rStr = matched;
}
}
Example:
string strout;
getBtwString("it's_12345bb2","it's","bb2",strout);
getBtwString("it's_12345bb2"s,"it's"s,"bb2"s,strout); // second solution
Headers:
#include <regex> // second solution
#include <string.h>

Boost regex token iterator: getting input between parentheses

I'm using the following function with Boost::tr1::sregex_token_iterator
int regexMultiple(std::string **s, std::string r)
{
std::tr1::regex term=(std::tr1::regex)r;
const std::tr1::sregex_token_iterator end;
int nCountOcurrences;
std::string sTemp=**s;
for (std::tr1::sregex_token_iterator i(sTemp.begin(),sTemp.end(), term); i != end; ++i)
{
(*s)[nCountOcurrences]=*i;
nCountOcurrences++;
}
return nCountOcurrences;
}
As you can suppose, **s is a pointer to a string, and r is the regex in question. This function works (in fact, this one might not work because I modified it from the original just to make it simpler, given that the rest is not relevant to the question).
What I want to know is, given, for example, a regex of this kind: "Email: (.*?) Phone:...", is there any way to retrieve only the (.*?) part from it, or should I apply substrings over the given result to achieve this instead?
Else, it's going to throw out: Email: myemail#domain.com Phone: ..
Thanks.
Should use regex_search like Kerrek SB recommended instead: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
int regexMultiple(std::string **s, std::string r)
{
std::tr1::regex term=(std::tr1::regex)r;
std::string::const_iterator start, end;
boost::match_results<std::string::const_iterator> what;
int nCountOcurrences=0;
std::string sTemp=**s;
start=sTemp.begin();
end=sTemp.end();
boost::match_flag_type flags = boost::match_default;
while (regex_search(start,end, what, term, flags))
{
(*s)[nCountOcurrences]=what[1];
nCountOcurrences++;
start = what[0].second;
flags |= boost::match_prev_avail;
flags |= boost::match_not_bob;
}
return nCountOcurrences;
}