find alphabetic substring - c++

I have the following strings, from which i want to extract the alphabetic part (alphabetic substring) only which is greater than 1:
% d. i.p.p. attendu --> attendu
aprà ¨ s. expertise --> apr, expertise
n.c.p.c. condamner --> condamner
I am trying the following piece code :
#include <regex>
#include <iostream>
void main()
{
const std::string s = "% d. i.p.p. attendu";
std::regex rgx("[a-zA-Z]{2,20}");
std::smatch match;
if (std::regex_search(s.begin(), s.end(), match, rgx))
std::cout << "match: " << match[1] << '\n';
}
But I am having the following error when i run the code :
Terminate called after throwing an instance of 'std::regex_error' what(): regex_error
Can you please help me,
Thank you,
Hani.
Ok I managed to use boost since gcc's regex is an abomination.
#include <boost/regex.hpp>
void main()
{
const std::string s = "% d. i.p.p. tototo attendu";
boost::regex re("[a-zA-Z]{4,7}");
boost::smatch matches;
if( boost::regex_search( s, matches, re ) )
{
std::string value( matches[0].first, matches[0].second );
cout << value << " ";
}
}
Fine i found attendu but the output is only tototo. It's not incrementing
The return value is "tototo attendu" I was wondering if I can return each value at a time instead of 1 string

I was wondering if I can return each value at a time instead of 1 string
The only way of doing this seems to be via regex_iterator. Here’s an example using Boost:
#include <boost/regex.hpp>
#include <iostream>
int main() {
const std::string s = "% d. i.p.p. tototo attendu";
boost::regex rgx("([a-zA-Z]{2,20})");
boost::smatch match;
boost::sregex_iterator begin{s.begin(), s.end(), rgx},
end{};
for (auto&& i = begin; i != end; ++i)
std::cout << "match: " << *i << '\n';
}
This yields:
match: tototo
match: attendu
Two things:
The return type of main is always int. Your code shouldn’t even compile.
I’ve added parentheses around your (first, which was correct!) regular expression so that it creates a capture for each match. The iterators then iterate over each match in turn.

Related

Regex not able to show chars after space

I want to break this string into two parts
{[data1]name=NAME1}{[data2]name=NAME2}
1) {[data1]name=NAME1}
2){[data2]name=NAME2}
I am using Regex to attain this and this works fine with the above string , but if i add space to the name then the regex does not take characters after the space.
{[data1]name=NAME 1}{[data2]name=NAME 2}
In this string it breaks only till NAME and does not show the 1 and 2 chars
This is my code
std::string stdstrData = "{[data1]name=NAME1}{[data2]name=NAME2}"
std::vector<std::string> commandSplitUnderScore;
std::regex re("\\}");
std::sregex_token_iterator iter(stdstrData.begin(), stdstrData.end(), re, -1);
std::sregex_token_iterator end;
while (iter != end) {
if (iter->length()) {
commandSplitUnderScore.push_back(*iter);
}
++iter;
}
for (auto& str : commandSplitUnderScore) {
std::cout << str << std::endl;
}
A good place to start is to use regex101.com and debug your regex before putting it into your c++ code. e.g. https://regex101.com/r/ID6OSj/1 (don't forget to escape your C++ string properly when copying the regex you made on that site).
Example :
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string input{ "{[data]name=NAME1}{[data]name=NAME2}" };
std::regex rx{ "(\\{\\[data\\].*?\\})(\\{\\[data\\].*?\\})" };
std::smatch match;
if (std::regex_match(input, match, rx))
{
std::cout << match[1] << "\n"; // match[1] is first group
std::cout << match[2] << "\n"; // match[2] is second group
}
return 0;
}

Regex search overlapping matches c++11

What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}
Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex

C++ regex library

I have this sample code
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("\\(?<=pi\\)\\(\\d+\\)\\(?=\"\\)");
std::regex e (reg);
std::cout << "Target sequence: " << s << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
I need to get number between pi and " -> (piMYNUMBER")
In online regex service my regex works fine (?<=pi)(\d+)(?=") but c++ regex don't match anything.
Who knows what is wrong with my expression?
Best regards
That is correct, C++ std::regex flavors do not support lookbehinds. You need to capture the digits between pi and ":
#include <iostream>
#include <vector>
#include <regex>
int main() {
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("pi(\\d+)\""); // Or, with a raw string literal:
// std::string reg(R"(pi(\d+)\")");
std::regex e (reg);
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), e, 1),
std::sregex_token_iterator());
// Demo printing the results:
std::cout << "Number of matches: " << results.size() << std::endl;
for( auto & p : results ) std::cout << p << std::endl;
return 0;
}
See the C++ demo. Output:
Number of matches: 1
656
Here, pi(\d+)" pattern matches
pi - a literal substring
(\d+) - captures 1+ digits into Group 1
" - consumes a double quote.
Note the fourth argument to std::sregex_token_iterator, it is 1 because you need to collect only Group 1 values.

regex_iterator not matching groups in regular expression

How to extract Test and Again from string s in below code.
Currently I am using regex_iterator and it doesn't seems to be matching groups in regular expression and I am getting {{Test}} and {{Again}} in output.
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
return 0;
}
I also tried using regex_search but it is not working with multiple patterns and only giving Test ouput
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
if (std::regex_search(s, match, rgx,std::regex_constants::match_any))
{
std::cout<<"Match size is "<<match.size()<<std::endl;
for(auto elem:match)
std::cout << "match: " << elem << '\n';
}
}
Also as a side note why two backslashes are needed to escape { or }
To access the contents of the capturing group you need to use .str(1):
std::cout << match.str(1) << std::endl;
See the C++ demo:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
// std::regex rgx("\\{\\{(\\w+)\\}\\}");
// Better, use a raw string literal:
std::regex rgx(R"(\{\{(\w+)\}\})");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << std::endl;
next++;
}
return 0;
}
Output:
Test
Again
Note you do not have to use double backslashes to define a regex escape sequence inside raw string literals (here, R"(pattern_here)").

How to check for success in c++11 std::regex_replace?

I'd like to do the c++11 equivalent of a perl checked-replacement operation:
my $v = "foo.rat"
if ( $v =~ s/\.rat$/.csv/ )
{
...
}
I can do the replacement without trouble:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string s{ "foo.rat" } ;
std::regex reg{ R"((.*)\.rat$)" } ;
s = std::regex_replace( s, reg, "$1.csv" ) ;
std::cout << s << std::endl ;
s = std::regex_replace( "foo.noo", reg, "$1.csv" ) ;
std::cout << s << std::endl ;
return 0 ;
}
This gives:
foo.csv
foo.noo
Notice that the replace operation on the non-matching expression doesn't throw an error (which is what I expected).
Looking at the regex_replace documentation, it's not obvious to me how to check for the success of the replace operation. I could do a string compare, but that seems backwards?
Try to find match with std::regex_match or std::regex_search, check if something is matched, then replace found portion of string using std::string::replace. That shouldn't lead to performance loss.
Just to add to the accepted answer that it can also be done with a std::regex_iterator. This may be handy when multiple replacements may took place.
Iterator std::regex_iterator repeatedly calls std::regex_search() until all matches are found. If the position of the iterator at the beginning and the position at the end are the same, no match was found.
Function bool regex_replace(std::string &str, const std::string &re, const std::string& replacement) implements this behaviour:
#include <iostream>
#include <regex>
bool regex_replace(std::string &str, const std::string &re, const std::string& replacement) {
std::regex regexp(re);
//Search regex
std::sregex_iterator begin = std::sregex_iterator(str.begin(), str.end(), regexp);
std::sregex_iterator end = std::sregex_iterator();
//replace using iterator
for (std::sregex_iterator i = begin; i != end; ++i)
str.replace(i->position(), i->length(), replacement);
//returns true if at least one match was found and replaced
return (begin != end);
}
This function operates in place. At the end str have the replacements. Only if any replacement was made, the function returns true.
Following code shows how to use it to make multiple replacements and detect if any was made:
int main(int argc, char** argv) {
std::string rgx("[0-9]");
std::string str("0a1b2c3d4e5");
std::string replacement("?");
bool found = regex_replace(str, rgx, replacement);
std::cout << "Found any: " << (found ? "true" : "false") << std::endl;
std::cout << "string: " << str << std::endl;
return 0;
}
The code substitutes every digit for the quotation mark '?':
Found any: true
string: ?a?b?c?d?e?
use std::regex_constants::format_no_copy flag to change the behavior of regex_replace(). look the code below.
the return string will now be empty if match failed.
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string s{ "foo.rat" } ;
std::regex reg{ R"((.*)\.rat$)" } ;
auto rxMatchFlag = std::regex_constants::format_no_copy; //<---use this to modify the behavior of regex_replace when matching failed.
s = std::regex_replace( s, reg, "$1.csv", rxMatchFlag) ;
if(!s.empty()) std::cout << s << std::endl ;
else std::cout << "failed match" << std::endl;
s = std::regex_replace( "foo.noo", reg, "$1.csv", rxMatchFlag) ;
if(!s.empty()) std::cout << s << std::endl ;
else std::cout << "failed match" << std::endl;
return 0 ;
}
for the other flags, look them here
I don't believe there's any direct way to find out whether any replacements were made.
(Don't confuse this with "success / not success", which is not quite the same thing.)