C++ Regular Expressions with Boost Regex - c++

I am trying to take a string in C++ and find all IP addresses contained inside, and put them into a new vector string.
I've read a lot of documentation on regex, but I just can't seem to understand how to do this simple function.
I believe I can use this Perl expression to find any IP address:
re("\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b");
But I am still stumped on how to do the rest.

Perhaps you're looking for something like this. It uses regex_iterator to get all matches of the current pattern. See reference.
#include <boost/regex.hpp>
#include <iostream>
#include <string>
int main()
{
std::string text(" 192.168.0.1 abc 10.0.0.255 10.5.1 1.2.3.4a 5.4.3.2 ");
const char* pattern =
"\\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b";
boost::regex ip_regex(pattern);
boost::sregex_iterator it(text.begin(), text.end(), ip_regex);
boost::sregex_iterator end;
for (; it != end; ++it) {
std::cout << it->str() << "\n";
// v.push_back(it->str()); or something similar
}
}
Output:
192.168.0.1
10.0.0.255
5.4.3.2
Side note: you probably meant \\b instead of \b; I doubt you watnted to match backspace character.

The offered solution is quite good, thanks for it. Though I found a slight mistake in the pattern itself.
For example, something like 49.000.00.01 would be taken as a valid IPv4 address and from my understanding, it shouldn't be (just happened to me during some dump processing).
I suggest to improve the patter into:
"\\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)\\b";
This should allow only 0.0.0.0 as the all-zero-in, which I suppose to be correct and it will eliminate all .00. .000. etc.

#include <string>
#include <list>
#include <boost/regex.hpp>
typedef std::string::const_iterator ConstIt;
int main()
{
// input text, expected result, & proper address pattern
const std::string sInput
(
"192.168.0.1 10.0.0.255 abc 10.5.1.00"
" 1.2.3.4a 168.72.0 0.0.0.0 5.4.3.2"
);
const std::string asExpected[] =
{
"192.168.0.1",
"10.0.0.255",
"0.0.0.0",
"5.4.3.2"
};
boost::regex regexIPs
(
"(^|[ \t])("
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])"
")($|[ \t])"
);
// parse, check results, and return error count
boost::smatch what;
std::list<std::string> ns;
ConstIt end = sInput.end();
for (ConstIt begin = sInput.begin();
boost::regex_search(begin, end, what, regexIPs);
begin = what[0].second)
{
ns.push_back(std::string(what[2].first, what[2].second));
}
// check results and return number of errors (zero)
int iErrors = 0;
int i = 0;
for (std::string & s : ns)
if (s != asExpected[i ++])
++ iErrors;
return iErrors;
}

Related

How can I use the C++ regex library to find a match and *then* replace it?

I am writing what amounts to a tiny DSL in which each script is read from a single string, like this:
"func1;func2;func1;4*func3;func1"
I need to expand the loops, so that the expanded script is:
"func1;func2;func1;func3;func3;func3;func3;func1"
I have used the C++ standard regex library with the following regex to find those loops:
regex REGEX_SIMPLE_LOOP(":?[0-9]+)\\*([_a-zA-Z][_a-zA-Z0-9]*;");
smatch match;
bool found = std::regex_search(*this, match, std::regex(REGEX_SIMPLE_LOOP));
Now, it's not too difficult to read out the loop multiplier and print the function N times, but how do I then replace the original match with this string? I want to do this:
if (found) match[0].replace(new_string);
But I don't see that the library can do this.
My backup place is to regex_search, then construct the new string, and then use regex_replace, but it seems clunky and inefficient and not nice to essentially do two full searches like that. Is there a cleaner way?
You can also NOT use regex, the parsing isn't too difficult.
So regex might be overkill. Demo here : https://onlinegdb.com/RXLqLtrUQ-
(and yes my output gives an extra ; at the end)
#include <string>
#include <sstream>
#include <iostream>
int main()
{
std::istringstream is{ "func1;func2;func1;4*func3;func1" };
std::string split;
// use getline to split
while (std::getline(is, split, ';'))
{
// assume 1 repeat
std::size_t count = 1;
// if split part starts with a digit
if (std::isdigit(split.front()))
{
// look for a *
auto pos = split.find('*');
// the first part of the string contains the repeat count
auto count_str = split.substr(0, pos);
// convert that to a value
count = std::stoi(count_str);
// and keep the rest ("funcn")
split = split.substr(pos + 1, split.size() - pos - 1);
}
// now use the repeat count to build the output string
for (std::size_t n = 0; n < count; ++n)
{
std::cout << split << ";";
}
}
// TODO invalid input string handling.
return 0;
}

C++ regex replace whole word

I have a small game to do in which I need to sometimes replace some group of characters with the name of the player in the sentences.
For example, I could have a sentence like :
"[Player]! Are you okay? A plane crash happened, it's on fire!"
And I need to replace the "[Player]" with some name contained in a std::string.
I have been looking for about 20 minutes in other SO questions and in the CPP reference and I really can't understand how to use the regex.
I would like to know how I can replace all instances of the "[Player]" string in a std::string.
Personally I would not use regex for this. A simple search and replace should be enough.
These are (roughly) the functions I use:
// change the string in-place
std::string& replace_all_mute(std::string& s,
const std::string& from, const std::string& to)
{
if(!from.empty())
for(std::size_t pos = 0; (pos = s.find(from, pos) + 1); pos += to.size())
s.replace(--pos, from.size(), to);
return s;
}
// return a copy of the string
std::string replace_all_copy(std::string s,
const std::string& from, const std::string& to)
{
return replace_all_mute(s, from, to);
}
int main()
{
std::string s = "[Player]! Are you okay? A plane crash happened, it's on fire!";
replace_all_mute(s, "[Player]", "Uncle Bob");
std::cout << s << '\n';
}
Output:
Uncle Bob! Are you okay? A plane crash happened, it's on fire!
Regex is meant for more complex patterns. Consider, for example, that instead of simply matching [Player], you wanted to match anything between brackets. That would be a good use for regex.
Following is an example that does just that. Unfortunately, the interface of <regex> is not flexible enough to enable dynamic replacements, so we have to implement the actual replacing ourselves.
#include <iostream>
#include <regex>
int main() {
// Anything stored here can be replaced in the string.
std::map<std::string, std::string> vars {
{"Player1", "Bill"},
{"Player2", "Ted"}
};
// Matches anything between brackets.
std::regex r(R"(\[([^\]]+?)\])");
std::string str = "[Player1], [Player1]! Are you okay? [Player2] said that a plane crash happened!";
// We need to keep track of where we are, or else we would need to search from the start of
// the string everytime, which is very wasteful.
// std::regex_iterator won't help, because the replacement may be smaller
// than the match, and it would cause strings like "[Player1][Player1]" to not match properly.
auto pos=str.cbegin();
do {
// First, we try to get a match. If there's no more matches, exit.
std::smatch m;
regex_search(pos, str.cend(), m, r);
if (m.empty()) break;
// The interface of std::match_results is terrible. Let's get what we need and
// place it in apropriately named variables.
auto var_name = m[1].str();
auto start = m[0].first;
auto end = m[0].second;
auto value = vars[var_name];
// This does the actual replacement
str.replace(start, end, value);
// We update our position. The new search will start right at the end of the replacement.
pos = m[0].first + value.size();
} while(true);
std::cout << str;
}
Output:
Bill, Bill! Are you okay? Ted said that a plane crash happened!
See it live on Coliru
Simply find and replace, e.g. boost::replace_all()
#include <boost/algorithm/string.hpp>
std::string target(""[Player]! Are you okay? A plane crash happened, it's on fire!"");
boost::replace_all(target, "[Player]", "NiNite");
As some people have mentioned, find and replace might be more useful for this scenario, you could do something like this.
std::string name = "Bill";
std::string strToFind = "[Player]";
std::string str = "[Player]! Are you okay? A plane crash happened, it's on fire!";
str.replace(str.find(strToFind), strToFind.length(), name);

C++: Regex: returns full string and not matched group

for those asking, the {0} allows selection of any one block within the sResult string separated by the | 0 is the first block
it needs to be dynamic for future expansion as that number will be configurable by users
So I am working on a regex to extract 1 portion of a string, however while it matches the results return are not what is expected.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
for( int i = 0; i < regMatch.size(); i++)
{
//SUBMATCH 0 = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE"
//SUBMATCH 1 = "BUT|NOT|ANYTHNG|ELSE"
std::ssub_match sm = regMatch[i];
bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
For some reason I cannot figure out the code to get me just the MATCH_ME back so I can compare it to expected results list on the C++ side.
Anyone have any ideas on where I went wrong here.
It seems you're using regular expressions for what they haven't been designed for. You should first split your string at the delimiter | and apply regular expressions on the resulting tokens if you want to check them for validity.
By the way: The std::regex implementation in libstdc++ seems to be buggy. I just did some tests and found that even simple patterns containing escaped pipe characters like \\| failed to compile throwing a std::regex_error with no further information in the error message (GCC 4.8.1).
The following code example shows how to do what you are after - you compile this, then call it with a single numerical argument to extract that element of the input:
#include <iostream>
#include <cstring>
#include <regex>
int main(int argc, char *argv[]) {
char pat[100];
if (argc > 1) {
sprintf(pat, "^(?:[^|]+[|]){%s}([^|;]+)", argv[1]);
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern(pat);
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::ssub_match sm = regMatch[1];
std::cout << "The match is " << sm << std::endl;
//bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
return 0;
}
Creating an executable called match, you can then do
>> match 2
The match is NOT
which is what you wanted.
The regex, it turns out, works just fine - although as a matter of preference I would use \| instead of [|] for the first part.
Turns out the problem was on the C side in extracting the match, it had to be done more directly, below is the code that gets me exactly what I wanted out of the string so I can use it later.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::string theMatchedPortion = regMatch[1];
//the issue was not with the regex but in how I was retrieving the results.
//theMatchedPortion now equals "MATCH_ME" and by changing the number associated
with it I can navigate through the string
}

iterating over boost regex_iterator results

I need some help to understand how to iterate over the search results from a boost::sregex_iterator. Basically I am passing in a ';' delimited set of IP addresses from the command line, and I would like to be able to process each IP address in turn using a boost::sregex_iterator.
The code below demonstrates what I am trying to do and also shows a workaround using the workingIterRegex - however the workaround limits the richness of my regular expression. I tried modifying the nonworkingIterRegex however it only returns the last IP address to the lambda.
Does anyone know how I can loop over each IP address individually without having to resort to such a hacked and simplistic workingIterRegex.
I found the following http://www.cs.ucr.edu/~cshelton/courses/cppsem/regex2.cc to show how to call the lambda with the individual sub matches.
I also used the example in looping through sregex_iterator results to get access to the sub matches however it gave similar results.
After using the workingIterRegex the code prints out the IP addresses one per line
#include <string>
#include <boost/regex.hpp>
...
std::string errorMsg;
std::string testStr("192.168.1.1;192.168.33.1;192.168.34.1;192.168.2.1");
static const boost::regex nonworkingIterregex (
"((\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3});?)+",
boost::regex_constants::icase);
boost::smatch match;
if (boost::regex_match(testStr, match, nonworkingIterregex)) {
static const boost::regex workingIterRegex("[^;]+");
std::for_each(boost::sregex_iterator(
begin(iter->second), end(iter->second), workingIterRegex),
boost::sregex_iterator(),
[](const boost::smatch &match){
std::cout << match.str() << std::endl;
}
);
mNICIPAddrs = UtlStringUtils::tokenize(iter->second, ";");
std::string errorMsg;
} else {
errorMsg = "Malformed CLI Arg:" + iter->second;
}
if (!errorMsg.empty()) {
throw std::invalid_argument(errorMsg);
}
...
After some experimentation I found that the following worked - but I am not sure why the c
Try using your regex expression like this:
(\\d{1,3}\\.){3}(\\d{1,3})

Getting sub-match_results with boost::regex

Hey, let's say I have this regex: (test[0-9])+
And that I match it against: test1test2test3test0
const bool ret = boost::regex_search(input, what, r);
for (size_t i = 0; i < what.size(); ++i)
cout << i << ':' << string(what[i]) << "\n";
Now, what[1] will be test0 (the last occurrence). Let's say that I need to get test1, 2 and 3 as well: what should I do?
Note: the real regex is extremely more complex and has to remain one overall match, so changing the example regex to (test[0-9]) won't work.
I think Dot Net has the ability to make single capture group Collections so that (grp)+ will create a collection object on group1. The boost engine's regex_search() is going to be just like any ordinary match function. You sit in a while() loop matching the pattern where the last match left off. The form you used does not use a bid-itterator, so the function won't start the next match where the last match left off.
You can use the itterator form:
(Edit - you can also use the token iterator, defining what groups to iterate over. Added in the code below).
#include <boost/regex.hpp>
#include <string>
#include <iostream>
using namespace std;
using namespace boost;
int main()
{
string input = "test1 ,, test2,, test3,, test0,,";
boost::regex r("(test[0-9])(?:$|[ ,]+)");
boost::smatch what;
std::string::const_iterator start = input.begin();
std::string::const_iterator end = input.end();
while (boost::regex_search(start, end, what, r))
{
string stest(what[1].first, what[1].second);
cout << stest << endl;
// Update the beginning of the range to the character
// following the whole match
start = what[0].second;
}
// Alternate method using token iterator
const int subs[] = {1}; // we just want to see group 1
boost::sregex_token_iterator i(input.begin(), input.end(), r, subs);
boost::sregex_token_iterator j;
while(i != j)
{
cout << *i++ << endl;
}
return 0;
}
Output:
test1
test2
test3
test0
Boost.Regex offers experimental support for exactly this feature (called repeated captures); however, since it's huge performance hit, this feature is disabled by default.
To enable repeated captures, you need to rebuild Boost.Regex and define macro BOOST_REGEX_MATCH_EXTRA in all translation units; the best way to do this is to uncomment this define in boost/regex/user.hpp (see the reference, it's at the very bottom of the page).
Once compiled with this define, you can use this feature by calling/using regex_search, regex_match and regex_iterator with match_extra flag.
Check reference to Boost.Regex for more info.
Seems to me like you need to create a regex_iterator, using the (test[0-9]) regex as input. Then you can use the resulting regex_iterator to enumerate the matching substrings of your original target.
If you still need "one overall match" then perhaps that work has to be decoupled from the task of finding matching substrings. Can you clarify that part of your requirement?