C++ regex library - c++

I have this sample code
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("\\(?<=pi\\)\\(\\d+\\)\\(?=\"\\)");
std::regex e (reg);
std::cout << "Target sequence: " << s << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
I need to get number between pi and " -> (piMYNUMBER")
In online regex service my regex works fine (?<=pi)(\d+)(?=") but c++ regex don't match anything.
Who knows what is wrong with my expression?
Best regards

That is correct, C++ std::regex flavors do not support lookbehinds. You need to capture the digits between pi and ":
#include <iostream>
#include <vector>
#include <regex>
int main() {
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("pi(\\d+)\""); // Or, with a raw string literal:
// std::string reg(R"(pi(\d+)\")");
std::regex e (reg);
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), e, 1),
std::sregex_token_iterator());
// Demo printing the results:
std::cout << "Number of matches: " << results.size() << std::endl;
for( auto & p : results ) std::cout << p << std::endl;
return 0;
}
See the C++ demo. Output:
Number of matches: 1
656
Here, pi(\d+)" pattern matches
pi - a literal substring
(\d+) - captures 1+ digits into Group 1
" - consumes a double quote.
Note the fourth argument to std::sregex_token_iterator, it is 1 because you need to collect only Group 1 values.

Related

Can't match curly brackets using regex_search [duplicate]

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output.
Is there some other trick in using regex in C++.
I tried with other languages and it works just fine.
#include<bits/stdc++.h>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\1\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Your problem is your backslashes are escaping the '1''s in your string. You need to inform std::regex to treat them as '\' 's. You can do this by using a raw string R"((.+)\1\1+)", or by escaping the slashes, as shown here:
#include <regex>
#include <string>
#include <iostream>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\\1\\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Which produces the output
abababab ab

C++: regex matching code, printing multiple matches?

I'm using the following code which is similar to Stroustrup's C++ 4th Edition Page 127&128. Per output log below, it prints the first match, however not the match for the trailing -XXXX digits.
Does anyone know why the trailing digits are not matched and/or printed??
Thanks
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char *argv[])
{
// ZIP code pattern: XXddddd-dddd and variants
regex pat (R"(\w{2}\s*\d{5}(−\d{4})?)");
int lineno = 0;
for (string line; getline(cin,line);) {
++lineno;
smatch matches; // matched strings go here
if (regex_search(line, matches, pat)) // search for pat in line
for (auto p : matches) {
cout << p << " ";
}
cout << endl;
// cout << lineno << ": " << matches[0] << '\n';
}
return 0;
}
Output log:
$ ./a.out
AB00000-0000
AB00000
− is not -. That are two different symbols. You have − in the code and - in the input. Here I fixed the code:
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char *argv[])
{
// ZIP code pattern: XXddddd-dddd and variants
regex pat (R"(\w{2}\s*\d{5}(-\d{4})?)");
int lineno = 0;
for (string line; getline(cin,line);) {
++lineno;
smatch matches; // matched strings go here
if (regex_search(line, matches, pat)) // search for pat in line
for (auto p : matches) {
cout << p << " ";
}
cout << endl;
// cout << lineno << ": " << matches[0] << '\n';
}
return 0;
}

How to match two groups with different surroundings? C++

I would like to parse strings like (X->Y) or [X=>Y], and extract the X and Y parts. I did it like this:
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text1 = "(X->Y)";
std::string text2 = "[X=>Y]";
std::regex my_regex("\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]");
std::smatch reg_match;
if(std::regex_match(text1, reg_match, my_regex)) {
std::cout << reg_match[1].str() << ' ' << reg_match[2].str() << std::endl;
} else {
std::cout << "Nothing" << std::endl;
}
}
It works with text1, but it gives an empty result with text2. What do I wrong? Why isn't X and Y in reg_match[1] and reg_match[2] if I run the code with text2?
This is because when you are matching text1, groups 1 and 2 gets matched:
\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]
^^^^^^ ^^^^^
Whereas in text2, groups 3 and 4 gets matched:
\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]
^^^^^^ ^^^^^
So you have to use reg_match[3] and reg_match[4] in the case of text2.
Of course, a more versatile solution would be to check whether reg_match[1] is empty first. If it is, use group 1 and 2, otherwise, use group 3 and 4.
Alternatively to the given answer by #Sweeper you could rewrite your regex to only have 2 match groups:
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text1 = "(X->Y)";
std::string text2 = "[X=>Y]";
std::regex my_regex("[[(]([A-Z]+)(?:->|=>)([A-Z]+)[)\\]]");
std::smatch reg_match;
if(std::regex_match(text1, reg_match, my_regex)) {
std::cout << reg_match[1].str() << ' ' << reg_match[2].str() << std::endl;
} else {
std::cout << "Nothing" << std::endl;
}
}
This however has the small disadvantage that it will also match a few more variants:
(X=>Y)
[X->Y)
(X=>Y]
etc...

MSVC regular expression match

I am trying to match a literal number, e.g. 1600442 using a set of regular expressions in Microsoft Visual Studio 2010. My regular expressions are simply:
1600442|7654321
7895432
The problem is that both of the above matches the string.
Implementing this in Python gives the expected result:
import re
serial = "1600442"
re1 = "1600442|7654321"
re2 = "7895432"
m = re.match(re1, serial)
if m:
print "found for re1"
print m.groups()
m = re.match(re2, serial)
if m:
print "found for re2"
print m.groups()
Gives output
found for re1
()
Which is what I expected. Using this code in C++ however:
#include <string>
#include <iostream>
#include <regex>
int main(){
std::string serial = "1600442";
std::tr1::regex re1("1600442|7654321");
std::tr1::regex re2("7895432");
std::tr1::smatch match;
std::cout << "re1:" << std::endl;
std::tr1::regex_search(serial, match, re1);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl << "re2:" << std::endl;
std::tr1::regex_search(serial, match, re2);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
gives me:
re1:
1600442
re2:
1600442
which is not what I expected. Why do I get match here?
The smatch does not get overwritten by the second call to regex_search thus, it is left intact and contains the first results.
You can move the regex searching code to a separate method:
void FindMeText(std::regex re, std::string serial)
{
std::smatch match;
std::regex_search(serial, match, re);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
}
int main(){
std::string serial = "1600442";
std::regex re1("^(?:1600442|7654321)");
std::regex re2("^7895432");
std::cout << "re1:" << std::endl;
FindMeText(re1, serial);
std::cout << "re2:" << std::endl;
FindMeText(re2, serial);
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
Result:
Note that Python re.match searches for the pattern match at the start of string only, thus I suggest using ^ (start of string) at the beginning of each pattern.

This regex doesn't work in c++

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output.
Is there some other trick in using regex in C++.
I tried with other languages and it works just fine.
#include<bits/stdc++.h>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\1\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Your problem is your backslashes are escaping the '1''s in your string. You need to inform std::regex to treat them as '\' 's. You can do this by using a raw string R"((.+)\1\1+)", or by escaping the slashes, as shown here:
#include <regex>
#include <string>
#include <iostream>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\\1\\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Which produces the output
abababab ab