Taking every character literally in RegEx - c++

Using std::regex I want to create a function that takes, for example, a string
and creates a RegEx using that string, but with every char of the string matched literally.
For example, lets say s("[ds-aa]"); I want to create a RegEx using that string but literally so that the RegEx will match "\[ds\-aa\]".

Assuming you are using std::regex, and the default ECMA regex flavor, you just need to escape
. * + ? ^ $ { } ( ) | [ ] \
So, you can use
#include <regex>
#include <string>
#include <iostream>
using namespace std;
std::string regexEscape(std::string str) {
return std::regex_replace(str, std::regex(R"([.^$|()[\]{}*+?\\])"), R"(\$&)");
}
int main()
{
std::cout << "Test escaped pattern: " << regexEscape("[da-d$\\]") << std::endl; // = > \[da-d\$\\\]
std::string key = "\\56";
string input = "John\\56 Fred\\12";
std::regex rx(R"((\w+))" + regexEscape(key));
smatch m;
if (std::regex_search(input, m, rx)) {
std::cout << "Who has \\56? - " << m[1].str() << std::endl;
}
}
See IDEONE demo
Results:
Test escaped pattern: \[da-d\$\\\]
Who has \56? - John

Related

extract digits from string using Regex in c++

I have created this c++ to extract digits from mixed strings limited by xxx and yyy
Here is my code
#include <iostream>
#include <regex>
using namespace std;
int main() {
string text = "xxx1111yyy xxxrandomstring2222yyy";
string start_delimiter = "xxx";
string end_delimiter = "yyy";
regex pattern(start_delimiter + "([0-9]+)" + end_delimiter);
smatch match;
while (regex_search(text, match, pattern)) {
cout << match[1] << endl;
text = match.suffix().str();
}
return 0;
}
I expect the output:
1111
2222
But I'm getting in output only: 1111
Where is my fault ?
As I understand, delimiters xxx and yyyy are statics, randomstring isn't static so it can be any random string.
So the error simply is in your regex pattern.
it should something like this:
regex pattern("xxx.*?(\\d+).*?yyy");
The whole code could be like this:
#include <iostream>
#include <regex>
#include <string>
int main() {
std::string text =
"xxxrandomstring2222yyy xxx1111yyy";
std::regex pattern("xxx.*?(\\d+).*?yyy");
std::smatch match;
while (regex_search(text, match, pattern)) {
std::cout << match[1] << std::endl;
text = match.suffix().str();
}
return 0;
}

Boost regex cpp for finding strings between %% with output excluding the % character itself

I am having a problem with boost regex in cpp. I want to match a string like
"Hello %world% regex %cpp%" and expected string output is world, cpp
Can somebody suggest a regex for this
Thanks
Anil
I personally prefer "\\%([^\\%]*)\\%" (or as a raw string R"r(\%([^\%]*)\%)r")
It doesn't rely on non-greedy qualifiers
Which is essentially
one percent character \\%
any amount of non-percent characters [^\\%]*
one percent character \\%
I know this is tagged boost but here's a solution with std::regex
#include <string>
#include <regex>
#include <iostream>
int main()
{
using namespace std;
string source = "Hello %world%";
regex match_percent_enclosed (R"_(\%([^\%]*)\%)_");
smatch between_percent;
bool found_match = regex_search(source,between_percent,match_percent_enclosed);
if(found_match && between_percent.size()>1)
cout << "found: \"" << between_percent[1].str() << "\"." << endl;
else
cout << "no match found." << endl;
}
you may get some idea
%(.+?)%
Result:
Match 1
1. world
Match 2
1. cpp
You can use this regex \%(.*?)\%smallest group
Online regex: https://regex101.com/r/dSCE2a/2
And for the code with boost
#include <iostream>
#include <cstdlib>
#include <boost/regex.hpp>
using namespace std;
int main()
{
boost::cmatch mat;
boost::regex reg( "\\%(.*?)\\%" );
char szStr[] = "Hello %world% regex %cpp%";
char *where = szStr;
while (regex_search(where, mat, reg))
{
cout << mat[1] << endl; // 0 for whole match, 1 for sub
where = (char*)mat[0].second;
}
}

Regex search overlapping matches c++11

What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}
Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex

C++ regex library

I have this sample code
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("\\(?<=pi\\)\\(\\d+\\)\\(?=\"\\)");
std::regex e (reg);
std::cout << "Target sequence: " << s << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
I need to get number between pi and " -> (piMYNUMBER")
In online regex service my regex works fine (?<=pi)(\d+)(?=") but c++ regex don't match anything.
Who knows what is wrong with my expression?
Best regards
That is correct, C++ std::regex flavors do not support lookbehinds. You need to capture the digits between pi and ":
#include <iostream>
#include <vector>
#include <regex>
int main() {
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("pi(\\d+)\""); // Or, with a raw string literal:
// std::string reg(R"(pi(\d+)\")");
std::regex e (reg);
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), e, 1),
std::sregex_token_iterator());
// Demo printing the results:
std::cout << "Number of matches: " << results.size() << std::endl;
for( auto & p : results ) std::cout << p << std::endl;
return 0;
}
See the C++ demo. Output:
Number of matches: 1
656
Here, pi(\d+)" pattern matches
pi - a literal substring
(\d+) - captures 1+ digits into Group 1
" - consumes a double quote.
Note the fourth argument to std::sregex_token_iterator, it is 1 because you need to collect only Group 1 values.

Replacing characters by a modified version of them in a string

I want to replace the characters below (or sub-strings for the && and ||)in an input string with regex replace
+ - ! ( ) { } [ ] ^ " ~ * ? : \ && ||
How can I write this request in the construction of the std::regex ?
For example if I have
"(1+1):2"
I want to have an input of :
"\(1\+1\)\:2"
The final code looks something like this :
std::string s ("(1+1):2");
std::regex e ("???"); // what should I put here ?
std::cout << std::regex_replace (s,e,"\\$2"); // is this correct ?
You can use std::regex_replace with capture:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
regex regex_a("(\\+|-|!|\\(|\\)|\\{|\\}|\\[|\\]|\\^|\"|~|\\*|\\?|:|\\\\|&&|\\|\\|)");
cout << regex_replace("(1+1):2", regex_a, "\\$0") << endl;
}
This prints
$ ./a.out
\(1\+1\)\:2