How to match two groups with different surroundings? C++ - c++

I would like to parse strings like (X->Y) or [X=>Y], and extract the X and Y parts. I did it like this:
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text1 = "(X->Y)";
std::string text2 = "[X=>Y]";
std::regex my_regex("\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]");
std::smatch reg_match;
if(std::regex_match(text1, reg_match, my_regex)) {
std::cout << reg_match[1].str() << ' ' << reg_match[2].str() << std::endl;
} else {
std::cout << "Nothing" << std::endl;
}
}
It works with text1, but it gives an empty result with text2. What do I wrong? Why isn't X and Y in reg_match[1] and reg_match[2] if I run the code with text2?

This is because when you are matching text1, groups 1 and 2 gets matched:
\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]
^^^^^^ ^^^^^
Whereas in text2, groups 3 and 4 gets matched:
\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]
^^^^^^ ^^^^^
So you have to use reg_match[3] and reg_match[4] in the case of text2.
Of course, a more versatile solution would be to check whether reg_match[1] is empty first. If it is, use group 1 and 2, otherwise, use group 3 and 4.

Alternatively to the given answer by #Sweeper you could rewrite your regex to only have 2 match groups:
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text1 = "(X->Y)";
std::string text2 = "[X=>Y]";
std::regex my_regex("[[(]([A-Z]+)(?:->|=>)([A-Z]+)[)\\]]");
std::smatch reg_match;
if(std::regex_match(text1, reg_match, my_regex)) {
std::cout << reg_match[1].str() << ' ' << reg_match[2].str() << std::endl;
} else {
std::cout << "Nothing" << std::endl;
}
}
This however has the small disadvantage that it will also match a few more variants:
(X=>Y)
[X->Y)
(X=>Y]
etc...

Related

Can't match curly brackets using regex_search [duplicate]

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output.
Is there some other trick in using regex in C++.
I tried with other languages and it works just fine.
#include<bits/stdc++.h>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\1\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Your problem is your backslashes are escaping the '1''s in your string. You need to inform std::regex to treat them as '\' 's. You can do this by using a raw string R"((.+)\1\1+)", or by escaping the slashes, as shown here:
#include <regex>
#include <string>
#include <iostream>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\\1\\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Which produces the output
abababab ab

Regex match the entire string [duplicate]

I'm reading a text file in the form of
People list
[Jane]
Female
31
...
and for each line I want to loop through and find the line that contains "[...]"
For example, [Jane]
I came up with the regex expression
"(^[\w+]$)"
which I tested that it works using regex101.com.
However, when I try to use that in my code, it fails to match with anything.
Here's my code:
void Jane::JaneProfile() {
// read each line, for each [title], add the next lines into its array
std::smatch matches;
for(int i = 0; i < m_numberOfLines; i++) { // #lines in text file
std::regex pat ("(^\[\w+\]$)");
if(regex_search(m_lines.at(i), matches, pat)) {
std::cout << "smatch " << matches.str(0) << std::endl;
std::cout << "smatch.size() = " << matches.size() << std::endl;
} else
std::cout << "wth" << std::endl;
}
}
When I run this code, all the lines go to the else loop and nothing matches...
I searched up for answers, but I got confused when I saw that for C++ you have to use double backslashes instead one backslash to escape... But it didn't work for my code even when I used double backslashes...
Where did I go wrong?
By the way, I'm using Qt Creator 3.6.0 Based on (Desktop) Qt 5.5.1 (Clang 6.1 (Apple), 64 bit)
---Edit----
I tried doing:
std::regex pat (R"(^\[\\w+\]$)");
But I get an error saying
Use of undeclared identifier 'R'
I already have #include <regex> but do I need to include something else?
Either escape the backslashes or use the raw character version with a prefix that won't appear in the regex:
escaped:
std::regex pat("^\\[\\w+\\]$");
raw character string:
std::regex pat(R"regex(^\[\w+\]$)regex");
working demo (adapted from OPs posted code):
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
int main()
{
auto test_data =
"People list\n"
"[Jane]\n"
"Female\n"
"31";
// initialise test data
std::istringstream source(test_data);
std::string buffer;
std::vector<std::string> lines;
while (std::getline(source, buffer)) {
lines.push_back(std::move(buffer));
}
// test the regex
// read each line, for each [title], add the next lines into its array
std::smatch matches;
for(int i = 0; i < lines.size(); ++i) { // #lines in text file
static const std::regex pat ("(^\\[\\w+\\]$)");
if(regex_search(lines.at(i), matches, pat)) {
std::cout << "smatch " << matches.str() << std::endl;
std::cout << "smatch.size() = " << matches.size() << std::endl;
} else
std::cout << "wth" << std::endl;
}
return 0;
}
expected output:
wth
smatch [Jane]
smatch.size() = 2
wth
wth

C++ regex library

I have this sample code
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("\\(?<=pi\\)\\(\\d+\\)\\(?=\"\\)");
std::regex e (reg);
std::cout << "Target sequence: " << s << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
I need to get number between pi and " -> (piMYNUMBER")
In online regex service my regex works fine (?<=pi)(\d+)(?=") but c++ regex don't match anything.
Who knows what is wrong with my expression?
Best regards
That is correct, C++ std::regex flavors do not support lookbehinds. You need to capture the digits between pi and ":
#include <iostream>
#include <vector>
#include <regex>
int main() {
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("pi(\\d+)\""); // Or, with a raw string literal:
// std::string reg(R"(pi(\d+)\")");
std::regex e (reg);
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), e, 1),
std::sregex_token_iterator());
// Demo printing the results:
std::cout << "Number of matches: " << results.size() << std::endl;
for( auto & p : results ) std::cout << p << std::endl;
return 0;
}
See the C++ demo. Output:
Number of matches: 1
656
Here, pi(\d+)" pattern matches
pi - a literal substring
(\d+) - captures 1+ digits into Group 1
" - consumes a double quote.
Note the fourth argument to std::sregex_token_iterator, it is 1 because you need to collect only Group 1 values.

c++ regex: how to use sub-matches

This code will output 192.168.1.105 but I want it to find each number-part of the ip. The output would be
192
168
1
105
Since the ip_result only has 1 sub-match (192.168.1.1), how would I get 4 submatches for each number-part?
#include <iostream>
#include <regex>
#include <string>
std::regex ip_reg("\\d{1,3}."
"\\d{1,3}."
"\\d{1,3}."
"\\d{1,3}");
void print_results(const std::string& ip) {
std::smatch ip_result;
if (std::regex_match(ip, ip_result, ip_reg))
for (auto pattern : ip_result)
std::cout << pattern << std::endl;
else
std::cout << "No match!" << std::endl;
}
int main() {
const std::string ip_str("192.168.1.105");
ip::print_results(ip_str);
}
I rewrote ip_reg to use sub-patterns and print_results to use iterators
std::regex ip_reg("(\\d{1,3})\\."
"(\\d{1,3})\\."
"(\\d{1,3})\\."
"(\\d{1,3})");
void print_results(const std::string& ip) {
std::smatch ip_result;
if (std::regex_match(ip, ip_result, ip_reg)) {
std::smatch::iterator ip_it = ip_result.begin();
for (std::advance(ip_it, 1);
ip_it != ip_result.end();
advance(ip_it, 1))
std::cout << *ip_it << std::endl;
} else
std::cout << "No match!" << std::endl;
}
If you replace std::regex_match with std::regex_search, loop that and always remove the match, you can access all the submatches. Also, you need to change the expression to only one group of digits:
std::regex ip_reg{ "\\d{1,3}" };
void print_results(const std::string& ip_str) {
std::string ip = ip_str; //make a copy!
std::smatch ip_result;
while (std::regex_search(ip, ip_result, ip_reg)){ //loop
std::cout << ip_result[0] << std::endl;
ip = ip_result.suffix(); //remove "192", then "168" ...
}
}
output:
192
168
1
105

This regex doesn't work in c++

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output.
Is there some other trick in using regex in C++.
I tried with other languages and it works just fine.
#include<bits/stdc++.h>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\1\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Your problem is your backslashes are escaping the '1''s in your string. You need to inform std::regex to treat them as '\' 's. You can do this by using a raw string R"((.+)\1\1+)", or by escaping the slashes, as shown here:
#include <regex>
#include <string>
#include <iostream>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\\1\\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Which produces the output
abababab ab