extract digits from string using Regex in c++ - c++

I have created this c++ to extract digits from mixed strings limited by xxx and yyy
Here is my code
#include <iostream>
#include <regex>
using namespace std;
int main() {
string text = "xxx1111yyy xxxrandomstring2222yyy";
string start_delimiter = "xxx";
string end_delimiter = "yyy";
regex pattern(start_delimiter + "([0-9]+)" + end_delimiter);
smatch match;
while (regex_search(text, match, pattern)) {
cout << match[1] << endl;
text = match.suffix().str();
}
return 0;
}
I expect the output:
1111
2222
But I'm getting in output only: 1111
Where is my fault ?

As I understand, delimiters xxx and yyyy are statics, randomstring isn't static so it can be any random string.
So the error simply is in your regex pattern.
it should something like this:
regex pattern("xxx.*?(\\d+).*?yyy");
The whole code could be like this:
#include <iostream>
#include <regex>
#include <string>
int main() {
std::string text =
"xxxrandomstring2222yyy xxx1111yyy";
std::regex pattern("xxx.*?(\\d+).*?yyy");
std::smatch match;
while (regex_search(text, match, pattern)) {
std::cout << match[1] << std::endl;
text = match.suffix().str();
}
return 0;
}

Related

c++ regular matching ABC HHHH fetID_3141 ProID_1045

ABC HHHH fetID_3141 ProID_1045
The above is a string, I need to extract fetID_3141 and ProID_1045, mainly I need the numbers 3141 and 1045, how can I use C++ to do regular matching?
You can use std::regex for this.
#include <iostream>
#include <regex>
int main()
{
std::regex r("[a-zA-Z]+_([0-9]+)");
std::smatch m;
std::string s = "ABC HHHH fetID_3141 ProID_1045";
while (std::regex_search(s, m, r))
{
std::cout << m[1] << std::endl;
s = m.suffix().str();
}
}

How to match two groups with different surroundings? C++

I would like to parse strings like (X->Y) or [X=>Y], and extract the X and Y parts. I did it like this:
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text1 = "(X->Y)";
std::string text2 = "[X=>Y]";
std::regex my_regex("\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]");
std::smatch reg_match;
if(std::regex_match(text1, reg_match, my_regex)) {
std::cout << reg_match[1].str() << ' ' << reg_match[2].str() << std::endl;
} else {
std::cout << "Nothing" << std::endl;
}
}
It works with text1, but it gives an empty result with text2. What do I wrong? Why isn't X and Y in reg_match[1] and reg_match[2] if I run the code with text2?
This is because when you are matching text1, groups 1 and 2 gets matched:
\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]
^^^^^^ ^^^^^
Whereas in text2, groups 3 and 4 gets matched:
\\(([A-Z]+)->([A-Z]+)\\)|\\[([A-Z]+)=>([A-Z]+)\\]
^^^^^^ ^^^^^
So you have to use reg_match[3] and reg_match[4] in the case of text2.
Of course, a more versatile solution would be to check whether reg_match[1] is empty first. If it is, use group 1 and 2, otherwise, use group 3 and 4.
Alternatively to the given answer by #Sweeper you could rewrite your regex to only have 2 match groups:
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text1 = "(X->Y)";
std::string text2 = "[X=>Y]";
std::regex my_regex("[[(]([A-Z]+)(?:->|=>)([A-Z]+)[)\\]]");
std::smatch reg_match;
if(std::regex_match(text1, reg_match, my_regex)) {
std::cout << reg_match[1].str() << ' ' << reg_match[2].str() << std::endl;
} else {
std::cout << "Nothing" << std::endl;
}
}
This however has the small disadvantage that it will also match a few more variants:
(X=>Y)
[X->Y)
(X=>Y]
etc...

Boost regex cpp for finding strings between %% with output excluding the % character itself

I am having a problem with boost regex in cpp. I want to match a string like
"Hello %world% regex %cpp%" and expected string output is world, cpp
Can somebody suggest a regex for this
Thanks
Anil
I personally prefer "\\%([^\\%]*)\\%" (or as a raw string R"r(\%([^\%]*)\%)r")
It doesn't rely on non-greedy qualifiers
Which is essentially
one percent character \\%
any amount of non-percent characters [^\\%]*
one percent character \\%
I know this is tagged boost but here's a solution with std::regex
#include <string>
#include <regex>
#include <iostream>
int main()
{
using namespace std;
string source = "Hello %world%";
regex match_percent_enclosed (R"_(\%([^\%]*)\%)_");
smatch between_percent;
bool found_match = regex_search(source,between_percent,match_percent_enclosed);
if(found_match && between_percent.size()>1)
cout << "found: \"" << between_percent[1].str() << "\"." << endl;
else
cout << "no match found." << endl;
}
you may get some idea
%(.+?)%
Result:
Match 1
1. world
Match 2
1. cpp
You can use this regex \%(.*?)\%smallest group
Online regex: https://regex101.com/r/dSCE2a/2
And for the code with boost
#include <iostream>
#include <cstdlib>
#include <boost/regex.hpp>
using namespace std;
int main()
{
boost::cmatch mat;
boost::regex reg( "\\%(.*?)\\%" );
char szStr[] = "Hello %world% regex %cpp%";
char *where = szStr;
while (regex_search(where, mat, reg))
{
cout << mat[1] << endl; // 0 for whole match, 1 for sub
where = (char*)mat[0].second;
}
}

Regex search overlapping matches c++11

What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}
Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex

Taking every character literally in RegEx

Using std::regex I want to create a function that takes, for example, a string
and creates a RegEx using that string, but with every char of the string matched literally.
For example, lets say s("[ds-aa]"); I want to create a RegEx using that string but literally so that the RegEx will match "\[ds\-aa\]".
Assuming you are using std::regex, and the default ECMA regex flavor, you just need to escape
. * + ? ^ $ { } ( ) | [ ] \
So, you can use
#include <regex>
#include <string>
#include <iostream>
using namespace std;
std::string regexEscape(std::string str) {
return std::regex_replace(str, std::regex(R"([.^$|()[\]{}*+?\\])"), R"(\$&)");
}
int main()
{
std::cout << "Test escaped pattern: " << regexEscape("[da-d$\\]") << std::endl; // = > \[da-d\$\\\]
std::string key = "\\56";
string input = "John\\56 Fred\\12";
std::regex rx(R"((\w+))" + regexEscape(key));
smatch m;
if (std::regex_search(input, m, rx)) {
std::cout << "Who has \\56? - " << m[1].str() << std::endl;
}
}
See IDEONE demo
Results:
Test escaped pattern: \[da-d\$\\\]
Who has \56? - John