iterating over boost regex_iterator results - c++

I need some help to understand how to iterate over the search results from a boost::sregex_iterator. Basically I am passing in a ';' delimited set of IP addresses from the command line, and I would like to be able to process each IP address in turn using a boost::sregex_iterator.
The code below demonstrates what I am trying to do and also shows a workaround using the workingIterRegex - however the workaround limits the richness of my regular expression. I tried modifying the nonworkingIterRegex however it only returns the last IP address to the lambda.
Does anyone know how I can loop over each IP address individually without having to resort to such a hacked and simplistic workingIterRegex.
I found the following http://www.cs.ucr.edu/~cshelton/courses/cppsem/regex2.cc to show how to call the lambda with the individual sub matches.
I also used the example in looping through sregex_iterator results to get access to the sub matches however it gave similar results.
After using the workingIterRegex the code prints out the IP addresses one per line
#include <string>
#include <boost/regex.hpp>
...
std::string errorMsg;
std::string testStr("192.168.1.1;192.168.33.1;192.168.34.1;192.168.2.1");
static const boost::regex nonworkingIterregex (
"((\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3});?)+",
boost::regex_constants::icase);
boost::smatch match;
if (boost::regex_match(testStr, match, nonworkingIterregex)) {
static const boost::regex workingIterRegex("[^;]+");
std::for_each(boost::sregex_iterator(
begin(iter->second), end(iter->second), workingIterRegex),
boost::sregex_iterator(),
[](const boost::smatch &match){
std::cout << match.str() << std::endl;
}
);
mNICIPAddrs = UtlStringUtils::tokenize(iter->second, ";");
std::string errorMsg;
} else {
errorMsg = "Malformed CLI Arg:" + iter->second;
}
if (!errorMsg.empty()) {
throw std::invalid_argument(errorMsg);
}
...
After some experimentation I found that the following worked - but I am not sure why the c

Try using your regex expression like this:
(\\d{1,3}\\.){3}(\\d{1,3})

Related

C++: Regex: returns full string and not matched group

for those asking, the {0} allows selection of any one block within the sResult string separated by the | 0 is the first block
it needs to be dynamic for future expansion as that number will be configurable by users
So I am working on a regex to extract 1 portion of a string, however while it matches the results return are not what is expected.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
for( int i = 0; i < regMatch.size(); i++)
{
//SUBMATCH 0 = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE"
//SUBMATCH 1 = "BUT|NOT|ANYTHNG|ELSE"
std::ssub_match sm = regMatch[i];
bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
For some reason I cannot figure out the code to get me just the MATCH_ME back so I can compare it to expected results list on the C++ side.
Anyone have any ideas on where I went wrong here.
It seems you're using regular expressions for what they haven't been designed for. You should first split your string at the delimiter | and apply regular expressions on the resulting tokens if you want to check them for validity.
By the way: The std::regex implementation in libstdc++ seems to be buggy. I just did some tests and found that even simple patterns containing escaped pipe characters like \\| failed to compile throwing a std::regex_error with no further information in the error message (GCC 4.8.1).
The following code example shows how to do what you are after - you compile this, then call it with a single numerical argument to extract that element of the input:
#include <iostream>
#include <cstring>
#include <regex>
int main(int argc, char *argv[]) {
char pat[100];
if (argc > 1) {
sprintf(pat, "^(?:[^|]+[|]){%s}([^|;]+)", argv[1]);
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern(pat);
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::ssub_match sm = regMatch[1];
std::cout << "The match is " << sm << std::endl;
//bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
return 0;
}
Creating an executable called match, you can then do
>> match 2
The match is NOT
which is what you wanted.
The regex, it turns out, works just fine - although as a matter of preference I would use \| instead of [|] for the first part.
Turns out the problem was on the C side in extracting the match, it had to be done more directly, below is the code that gets me exactly what I wanted out of the string so I can use it later.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::string theMatchedPortion = regMatch[1];
//the issue was not with the regex but in how I was retrieving the results.
//theMatchedPortion now equals "MATCH_ME" and by changing the number associated
with it I can navigate through the string
}

C++ boost/regex regex_search

Consider the following string content:
string content = "{'name':'Fantastic gloves','description':'Theese gloves will fit any time period.','current':{'trend':'high','price':'47.1000'}";
I have never used regex_search and I have been searching around for ways to use it - I still do not quite get it. From that random string (it's from an API) how could I grab two things:
1) the price - in this example it is 47.1000
2) the name - in this example Fantastic gloves
From what I have read, regex_search would be the best approach here. I plan on using the price as an integer value, I will use regex_replace in order to remove the "." from the string before converting it. I have only used regex_replace and I found it easy to work with, I don't know why I am struggling so much with regex_search.
Keynotes:
Content is contained inside ' '
Content id and value is separated by :
Conent/value are separated by ,
Value of id's name and price will vary.
My first though was to locate for instance price and then move 3 characters ahead (':') and gather everything until the next ' - however I am not sure if I am completely off-track here or not.
Any help is appreciated.
boost::regex would not be needed. Regular expressions are used for more general pattern matching, whereas your example is very specific. One way to handle your problem is to break the string up into individual tokens. Here is an example using boost::tokenizer:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <map>
int main()
{
std::map<std::string, std::string> m;
std::string content = "{'name':'Fantastic gloves','description':'Theese gloves will fit any time period.','current':{'trend':'high','price':'47.1000'}";
boost::char_separator<char> sep("{},':");
boost::tokenizer<boost::char_separator<char>> tokenizer(content, sep);
std::string id;
for (auto tok = tokenizer.begin(); tok != tokenizer.end(); ++tok)
{
// Since "current" is a special case I added code to handle that
if (*tok != "current")
{
id = *tok++;
m[id] = *tok;
}
else
{
id = *++tok;
m[id] = *++tok; // trend
id = *++tok;
m[id] = *++tok; // price
}
}
std::cout << "Name: " << m["name"] << std::endl;
std::cout << "Price: " << m["price"] << std::endl;
}
Link to live code.
As the string you are attempting to parse appears to be JSON (JavaScript Object Notation), consider using a specialized JSON parser.
You can find a comprehensive list of JSON parsers in many languages including C++ at http://json.org/. Also, I found a discussion on the merits of several JSON parsers for C++ in response to this SO question.

How to check which matching group was used to match (boost-regex)

I'm using boost::regex to parse some formatting string where '%' symbol is escape character. Because I do not have much experience with boost::regex, and with regex at all to be honest I do some trial and error. This code is some kind of prototype that I came up with.
std::string regex_string =
"(?:%d\\{(.*)\\})|" //this group will catch string for formatting time
"(?:%([hHmMsSqQtTlLcCxXmMnNpP]))|" //symbols that have some meaning
"(?:\\{(.*?)\\})|" //some other groups
"(?:%(.*?)\\s)|"
"(?:([^%]*))";
boost::regex regex;
boost::smatch match;
try
{
regex.assign(regex_string, boost::regex_constants::icase);
boost::sregex_iterator res(pattern.begin(), pattern.end(), regex);
//pattern in line above is string which I'm parsing
boost::sregex_iterator end;
for(; res != end; ++res)
{
match = *res;
output << match.get_last_closed_paren();
//I want to know if the thing that was just written to output is from group describing time string
output << "\n";
}
}
catch(boost::regex_error &e)
{
output<<"regex error\n";
}
And this works pretty good, on the output I have exactly what I want to catch. But I do not know from which group it is. I could do something like match[index_of_time_group]!="" but this is kind of fragile, and doesn't look too good. If I change regex_string index that was pointing on group catching string for formatting time could also change.
Is there a neat way to do this? Something like naming groups? I'll be grateful for any help.
You can use boost::sub_match::matched bool member:
if(match[index_of_time_group].matched) process_it(match);
It is also possible to use named groups in regexp like: (?<name_of_group>.*), and with this above line could be changed to:
if(match["name_of_group"].matched) process_it(match);
Dynamically build regex_string from pairs of name/pattern, and return a name->index mapping as well as the regex. Then write some code that determines if the match comes from a given name.
If you are insane, you can do it at compile time (the mapping from tag to index that is). It isn't worth it.

C++ Regular Expressions with Boost Regex

I am trying to take a string in C++ and find all IP addresses contained inside, and put them into a new vector string.
I've read a lot of documentation on regex, but I just can't seem to understand how to do this simple function.
I believe I can use this Perl expression to find any IP address:
re("\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b");
But I am still stumped on how to do the rest.
Perhaps you're looking for something like this. It uses regex_iterator to get all matches of the current pattern. See reference.
#include <boost/regex.hpp>
#include <iostream>
#include <string>
int main()
{
std::string text(" 192.168.0.1 abc 10.0.0.255 10.5.1 1.2.3.4a 5.4.3.2 ");
const char* pattern =
"\\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b";
boost::regex ip_regex(pattern);
boost::sregex_iterator it(text.begin(), text.end(), ip_regex);
boost::sregex_iterator end;
for (; it != end; ++it) {
std::cout << it->str() << "\n";
// v.push_back(it->str()); or something similar
}
}
Output:
192.168.0.1
10.0.0.255
5.4.3.2
Side note: you probably meant \\b instead of \b; I doubt you watnted to match backspace character.
The offered solution is quite good, thanks for it. Though I found a slight mistake in the pattern itself.
For example, something like 49.000.00.01 would be taken as a valid IPv4 address and from my understanding, it shouldn't be (just happened to me during some dump processing).
I suggest to improve the patter into:
"\\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)\\b";
This should allow only 0.0.0.0 as the all-zero-in, which I suppose to be correct and it will eliminate all .00. .000. etc.
#include <string>
#include <list>
#include <boost/regex.hpp>
typedef std::string::const_iterator ConstIt;
int main()
{
// input text, expected result, & proper address pattern
const std::string sInput
(
"192.168.0.1 10.0.0.255 abc 10.5.1.00"
" 1.2.3.4a 168.72.0 0.0.0.0 5.4.3.2"
);
const std::string asExpected[] =
{
"192.168.0.1",
"10.0.0.255",
"0.0.0.0",
"5.4.3.2"
};
boost::regex regexIPs
(
"(^|[ \t])("
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])"
")($|[ \t])"
);
// parse, check results, and return error count
boost::smatch what;
std::list<std::string> ns;
ConstIt end = sInput.end();
for (ConstIt begin = sInput.begin();
boost::regex_search(begin, end, what, regexIPs);
begin = what[0].second)
{
ns.push_back(std::string(what[2].first, what[2].second));
}
// check results and return number of errors (zero)
int iErrors = 0;
int i = 0;
for (std::string & s : ns)
if (s != asExpected[i ++])
++ iErrors;
return iErrors;
}

Getting sub-match_results with boost::regex

Hey, let's say I have this regex: (test[0-9])+
And that I match it against: test1test2test3test0
const bool ret = boost::regex_search(input, what, r);
for (size_t i = 0; i < what.size(); ++i)
cout << i << ':' << string(what[i]) << "\n";
Now, what[1] will be test0 (the last occurrence). Let's say that I need to get test1, 2 and 3 as well: what should I do?
Note: the real regex is extremely more complex and has to remain one overall match, so changing the example regex to (test[0-9]) won't work.
I think Dot Net has the ability to make single capture group Collections so that (grp)+ will create a collection object on group1. The boost engine's regex_search() is going to be just like any ordinary match function. You sit in a while() loop matching the pattern where the last match left off. The form you used does not use a bid-itterator, so the function won't start the next match where the last match left off.
You can use the itterator form:
(Edit - you can also use the token iterator, defining what groups to iterate over. Added in the code below).
#include <boost/regex.hpp>
#include <string>
#include <iostream>
using namespace std;
using namespace boost;
int main()
{
string input = "test1 ,, test2,, test3,, test0,,";
boost::regex r("(test[0-9])(?:$|[ ,]+)");
boost::smatch what;
std::string::const_iterator start = input.begin();
std::string::const_iterator end = input.end();
while (boost::regex_search(start, end, what, r))
{
string stest(what[1].first, what[1].second);
cout << stest << endl;
// Update the beginning of the range to the character
// following the whole match
start = what[0].second;
}
// Alternate method using token iterator
const int subs[] = {1}; // we just want to see group 1
boost::sregex_token_iterator i(input.begin(), input.end(), r, subs);
boost::sregex_token_iterator j;
while(i != j)
{
cout << *i++ << endl;
}
return 0;
}
Output:
test1
test2
test3
test0
Boost.Regex offers experimental support for exactly this feature (called repeated captures); however, since it's huge performance hit, this feature is disabled by default.
To enable repeated captures, you need to rebuild Boost.Regex and define macro BOOST_REGEX_MATCH_EXTRA in all translation units; the best way to do this is to uncomment this define in boost/regex/user.hpp (see the reference, it's at the very bottom of the page).
Once compiled with this define, you can use this feature by calling/using regex_search, regex_match and regex_iterator with match_extra flag.
Check reference to Boost.Regex for more info.
Seems to me like you need to create a regex_iterator, using the (test[0-9]) regex as input. Then you can use the resulting regex_iterator to enumerate the matching substrings of your original target.
If you still need "one overall match" then perhaps that work has to be decoupled from the task of finding matching substrings. Can you clarify that part of your requirement?