C++ regex for overlapping matches - c++

I have a string 'CCCC' and I want to match 'CCC' in it, with overlap.
My code:
...
std::string input_seq = "CCCC";
std::regex re("CCC");
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\t" << "\t" << match.position() << "\t" << "\n";
next++;
}
...
However this only returns
CCC 0
and skips the CCC 1 solution, which is needed for me.
I read about non-greedy '?' matching, but I could not make it work

Your regex can be put into the capturing parentheses that can be wrapped with a positive lookahead.
To make it work on Mac, too, make sure the regex matches (and thus consumes) a single char at each match by placing a . (or - to also match line break chars - [\s\S]) after the lookahead.
Then, you will need to amend the code to get the first capturing group value like this:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
std::string input_seq = "CCCC";
std::regex re("(?=(CCC))."); // <-- PATTERN MODIFICATION
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\t" << "\t" << match.position() << "\t" << "\n"; // <-- SEE HERE
next++;
}
return 0;
}
See the C++ demo
Output:
CCC 0
CCC 1

Related

Regex not able to show chars after space

I want to break this string into two parts
{[data1]name=NAME1}{[data2]name=NAME2}
1) {[data1]name=NAME1}
2){[data2]name=NAME2}
I am using Regex to attain this and this works fine with the above string , but if i add space to the name then the regex does not take characters after the space.
{[data1]name=NAME 1}{[data2]name=NAME 2}
In this string it breaks only till NAME and does not show the 1 and 2 chars
This is my code
std::string stdstrData = "{[data1]name=NAME1}{[data2]name=NAME2}"
std::vector<std::string> commandSplitUnderScore;
std::regex re("\\}");
std::sregex_token_iterator iter(stdstrData.begin(), stdstrData.end(), re, -1);
std::sregex_token_iterator end;
while (iter != end) {
if (iter->length()) {
commandSplitUnderScore.push_back(*iter);
}
++iter;
}
for (auto& str : commandSplitUnderScore) {
std::cout << str << std::endl;
}
A good place to start is to use regex101.com and debug your regex before putting it into your c++ code. e.g. https://regex101.com/r/ID6OSj/1 (don't forget to escape your C++ string properly when copying the regex you made on that site).
Example :
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string input{ "{[data]name=NAME1}{[data]name=NAME2}" };
std::regex rx{ "(\\{\\[data\\].*?\\})(\\{\\[data\\].*?\\})" };
std::smatch match;
if (std::regex_match(input, match, rx))
{
std::cout << match[1] << "\n"; // match[1] is first group
std::cout << match[2] << "\n"; // match[2] is second group
}
return 0;
}

Can't match curly brackets using regex_search [duplicate]

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output.
Is there some other trick in using regex in C++.
I tried with other languages and it works just fine.
#include<bits/stdc++.h>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\1\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Your problem is your backslashes are escaping the '1''s in your string. You need to inform std::regex to treat them as '\' 's. You can do this by using a raw string R"((.+)\1\1+)", or by escaping the slashes, as shown here:
#include <regex>
#include <string>
#include <iostream>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\\1\\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Which produces the output
abababab ab

Get sentences with regular expressions in c++

I need to get sentences with regex from String with word "walk". Now I am trying just to get sentences
std::string s ("Hello world! My name is Mike. Why so serious?");
std::smatch m;
std::regex e ("^\\s+[A-Za-z,;'\"\\s]+[.?!]$"); // matches words beginning by "sub"
while (std::regex_search (s,m,e)) {
for (auto w:m)
std::cout << w << "\n" ;
}
And this doesn't work.
Apart from start and end of the string in regex, you are forgetting to update the 's' with the suffix.
#include <iostream>
#include <regex>
int main()
{
std::string s ("Hello world! My name is Mike. Why so serious?");
std::smatch m;
std::regex e ("\\s?[A-Za-z,;'\"\\s]+[.?!]");
while(std::regex_search (s,m,e))
{
std::cout << m.str() << "\n" ;
s = m.suffix();
}
return 0;
}

Regex search overlapping matches c++11

What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}
Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex

C++ regex library

I have this sample code
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("\\(?<=pi\\)\\(\\d+\\)\\(?=\"\\)");
std::regex e (reg);
std::cout << "Target sequence: " << s << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
I need to get number between pi and " -> (piMYNUMBER")
In online regex service my regex works fine (?<=pi)(\d+)(?=") but c++ regex don't match anything.
Who knows what is wrong with my expression?
Best regards
That is correct, C++ std::regex flavors do not support lookbehinds. You need to capture the digits between pi and ":
#include <iostream>
#include <vector>
#include <regex>
int main() {
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("pi(\\d+)\""); // Or, with a raw string literal:
// std::string reg(R"(pi(\d+)\")");
std::regex e (reg);
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), e, 1),
std::sregex_token_iterator());
// Demo printing the results:
std::cout << "Number of matches: " << results.size() << std::endl;
for( auto & p : results ) std::cout << p << std::endl;
return 0;
}
See the C++ demo. Output:
Number of matches: 1
656
Here, pi(\d+)" pattern matches
pi - a literal substring
(\d+) - captures 1+ digits into Group 1
" - consumes a double quote.
Note the fourth argument to std::sregex_token_iterator, it is 1 because you need to collect only Group 1 values.