Regex matching groups boost c++ - c++

[Noob Corner]
Hello,
I'm trying to catch a group with boost regex depending on the string that matched and I think I'm using a wrong way.
boost::regex expr(R"(:?(:?\busername *(\S*))|(:?\bserver *(\S*))|(:?\bpassword *(\S*)))");
std::vector<std::string > vec = { "server my.server.eu", "username myusername", "password mypassword" };
for (auto &elem : vec)
{
if (boost::regex_match(elem, expr, boost::match_extra))
{
boost::smatch what;
boost::regex_search(elem, what, expr);
std::cout << "Match 1 (username) : " << what[1].str() << std::endl;
std::cout << "Match 2 (server) : " << what[2].str() << std::endl;
std::cout << "Match 3 (password) : " << what[3].str() << std::endl;
}
}
I want something like :
server my.server.eu
Match 1 (username) : NULL
Match 2 (server) : my.server.eu
Match 3 (password) : NULL
I searched on internet but I have not found clear answers regarding the identification of capturing groups.
Thanks

You actually have 6 and not 3 matching groups.
Your regular expression is organized in such a manner that the odd matching groups will match a key-value (i.e.: username myusername) while the even matching groups will match the actual value (i.e.: myusername).
So you have to look for groups 2, 4 and 6 to get the username, server and password values.

Related

QRegularExpression find and capture all quoted and non-quoated parts in string

I am fairly new to using regexes.
I got a string which can contain quoted and not quoted substrings.
Here are examples of how they could look:
"path/to/program.exe" -a -b -c
"path/to/program.exe" -a -b -c
path/to/program.exe "-a" "-b" "-c"
path/to/program.exe "-a" -b -c
My regex looks like this: (("[^"]*")|([^"\t ]+))+
With ("[^"]+") I attempt to find every quoted substring and capture it.
With ([^"\t ]+) I attempt to find every substring without quotes.
My code to test this behaviour looks like this:
QString toMatch = R"del( "path/to/program.exe" -a -b -c)del";
qDebug() << "String to Match against: " << toMatch << "\n";
QRegularExpression re(R"del((("[^"]+")|([^"\t ]+))+)del");
QRegularExpressionMatchIterator it = re.globalMatch(toMatch);
int i = 0;
while (it.hasNext())
{
QRegularExpressionMatch match = it.next();
qDebug() << "iteration: " << i << " captured: " << match.captured(i) << "\n";
i++;
}
Output:
String to Match against: " \"path/to/program.exe\" -a -b -c"
iteration: 0 captured: "\"path/to/program.exe\""
iteration: 1 captured: "-a"
iteration: 2 captured: ""
iteration: 3 captured: "-c"
Testing it in Regex101 shows me the result I want.
I also tested it on some other websites e.g this.
I guess I am doing something wrong, could anyone point in the right direction?
Thanks in advance.
You assume that the groups you need to get value from will change their IDs with each new match, while, in fact, all the groups IDs are set in the pattern itself.
I suggest removing all groups and just extract the whole match value:
QString toMatch = R"del( "path/to/program.exe" -a -b -c)del";
qDebug() << "String to Match against: " << toMatch << "\n";
QRegularExpression re(R"del("[^"]+"|[^"\s]+)del");
QRegularExpressionMatchIterator it = re.globalMatch(toMatch);
while (it.hasNext())
{
QRegularExpressionMatch match = it.next();
qDebug() << " matched: " << match.captured(0) << "\n";
}
Note the "[^"]+"|[^"\s]+ pattern matches either
"[^"]+" - ", then one or more chars other than " and then a "
| - or
[^"\s]+ - one or more chars other than " and whitespace.
See the updated pattern demo.

c++11 (MSVS2012) regex looking for file names in multiple line std::string

I have been trying to search for a clear answer on this one, but not been able to find it.
So lets say I have the string (where \n could be \r\n - I want to handle both - not sure if that is relevant or not)
"4345t435\ng54t a_file_123.xml rk\ngreg a_file_j34.xml fger 43t54"
Then I want to get matches:
a_file_123.xml
a_file_j34.xml
Here is my test code:
const str::string s = "4345t435\ng54t a_file_123.xml rk\ngreg a_file_j34.xml fger 43t54";
std::smatch matches;
if (std::regex_search(s, matches, std::regex("a_file_(.*)\\.xml")))
{
std::cout << "total: " << matches.size() << std::endl;
for (unsigned int i = 0; i < matches.size(); i++)
{
std::cout << "match: " << matches[i] << std::endl;
}
}
Output is:
total: 2
match: a_file_123.xml
match: 123
I don't quite understand why match 2 is just "123"...
You only have one match, not two, as the regex_search method returns a single match. What you printed is two group values, Group 0 (the whole match, a_file_123.xml here) and Group 1 (the capturing group value, here, 123 that is a substring captured with a capturing group you defined as (.*) in the pattern).
If you want to match multiple strings, you need to use the regex iterator, not just a regex_search that only returns the first match.
Besides, .* is too greedy and will return weird results if you have more than 1 match on the same line. It seems you want to match letter or digits, so .* can be replaced with \w+. Well, if there can really be anything, just use .*?.
Use
const std::string s = "4345t435\ng54t a_file_123.xml rk\ngreg a_file_j34.xml fger 43t54";
const std::regex rx("a_file_\\w+\\.xml");
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), rx),
std::sregex_token_iterator());
std::cout << "Number of matches: " << results.size() << std::endl;
for (auto result : results)
{
std::cout << result << std::endl;
}
See the C++ demo yielding
Number of matches: 2
a_file_123.xml
a_file_j34.xml
Notes on regex
a_file_ - a literal substring
\\w+ - 1+ word chars (letters, digits, _) (note you may use [^.]*? here instead of \\w+ if you want to match any char, 0 or more repetitions, as few as possible, up to the first .xml)
\\. - a dot (if you do not escape it, it will match any char except line break chars)
xml - a literal substring.
See the regex demo

Need regex to look for two words while ignoring everything in-between

I've been trying to make regex find both a two digit number and the word thanks, but ignore everything in-between.
Here is my current implementation in C++, but I need the two patterns to be consolidated into one:
regex pattern1{R"(\d\d)"};
regex pattern2{R"(thanks)");
string to_search = "I would like the number 98 to be found and printed, thanks.";
smatch matches;
regex_search(to_search, matches, pattern1);
for (auto match : matches) {
cout << match << endl;
}
regex_search(to_search, matches, pattern2);
for (auto match : matches) {
cout << match << endl;
}
return 0;
Thanks!
EDIT: Is there any way to change ONLY the pattern and get rid of one of the for loops? Sorry for the confusion.

Regex grouping matches with C++ 11 regex library

I'm trying to use a regex for group matching. I want to extract two strings from one big string.
The input string looks something like this:
tХB:Username!Username#Username.tcc.domain.com Connected
tХB:Username!Username#Username.tcc.domain.com WEBMSG #Username :this is a message
tХB:Username!Username#Username.tcc.domain.com Status: visible
The Username can be anything. Same goes for the end part this is a message.
What I want to do is extract the Username that comes after the pound sign #. Not from any other place in the string, since that can vary aswell. I also want to get the message from the string that comes after the semicolon :.
I tried that with the following regex. But it never outputs any results.
regex rgx("WEBMSG #([a-zA-Z0-9]) :(.*?)");
smatch matches;
for(size_t i=0; i<matches.size(); ++i) {
cout << "MATCH: " << matches[i] << endl;
}
I'm not getting any matches. What is wrong with my regex?
Your regular expression is incorrect because neither capture group does what you want. The first is looking to match a single character from the set [a-zA-Z0-9] followed by <space>:, which works for single character usernames, but nothing else. The second capture group will always be empty because you're looking for zero or more characters, but also specifying the match should not be greedy, which means a zero character match is a valid result.
Fixing both of these your regex becomes
std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
But simply instantiating a regex and a match_results object does not produce matches, you need to apply a regex algorithm. Since you only want to match part of the input string the appropriate algorithm to use in this case is regex_search.
std::regex_search(s, matches, rgx);
Putting it all together
std::string s{R"(
tХB:Username!Username#Username.tcc.domain.com Connected
tХB:Username!Username#Username.tcc.domain.com WEBMSG #Username :this is a message
tХB:Username!Username#Username.tcc.domain.com Status: visible
)"};
std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
std::smatch matches;
if(std::regex_search(s, matches, rgx)) {
std::cout << "Match found\n";
for (size_t i = 0; i < matches.size(); ++i) {
std::cout << i << ": '" << matches[i].str() << "'\n";
}
} else {
std::cout << "Match not found\n";
}
Live demo
"WEBMSG #([a-zA-Z0-9]) :(.*?)"
This regex will match only strings, which contain username of 1 character length and any message after semicolon, but second group will be always empty, because tries to find the less non-greedy match of any characters from 0 to unlimited.
This should work:
"WEBMSG #([a-zA-Z0-9]+) :(.*)"

c++ regex substring wrong pattern found

I'm trying to understand the logic on the regex in c++
std::string s ("Ni Ni Ni NI");
std::regex e ("(Ni)");
std::smatch sm;
std::regex_search (s,sm,e);
std::cout << "string object with " << sm.size() << " matches\n";
This form shouldn't give me the number of substrings matching my pattern? Because it always give me 1 match and it says that the match is [Ni , Ni]; but i need it to find every single pattern; they should be 3 and like this [Ni][Ni][Ni]
The function std::regex_search only returns the results for the first match found in your string.
Here is a code, merged from yours and from cplusplus.com. The idea is to search for the first match, analyze it, and then start again using the rest of the string (that is to say, the sub-string that directly follows the match that was found, which can be retrieved thanks to match_results::suffix ).
Note that the regex has two capturing groups (Ni*) and ([^ ]*).
std::string s("the knights who say Niaaa and Niooo");
std::smatch m;
std::regex e("(Ni*)([^ ]*)");
while (std::regex_search(s, m, e))
{
for (auto x : m)
std::cout << x.str() << " ";
std::cout << std::endl;
s = m.suffix().str();
}
This gives the following output:
Niaaa Ni aaa
Niooo Ni ooo
As you can see, for every call to regex_search, we have the following information:
the content of the whole match,
the content of every capturing group.
Since we have two capturing groups, this gives us 3 strings for every regex_search.
EDIT: in your case if you want to retrieve every "Ni", all you need to do is to replace
std::regex e("(Ni*)([^ ]*)");
with
std::regex e("(Ni)");
You still need to iterate over your string, though.