find pattern in string regular expression c++ - c++

I have the following string : E:\501_Document_60_1_R.xml
I am trying to find the pattern "_R"
I am using the following : boost::regex rgx("[R]");
But it's not working : "Empty Match"
thank you.
Code:
vector<string> findMono(string s)
{
vector<string> vec;
boost::regex rgx("[R]");
boost::smatch match;
boost::sregex_iterator begin {s.begin(), s.end(), rgx},
end {};
for (boost::sregex_iterator& i = begin; i != end; ++i)
{
boost::smatch m = *i;
vec.push_back(m.str());
}
return vec;
}
int maint()
{
vector<string> m = findMono("E:\501_Document_60_1_R.xml");
if(m.size() > 0) cout << "Match" << endl;
else cout << "No Match" << endl;
return 0;
}

As we discussed in the comments, "_R" will technically work for your regular expression given your current data set.
However, I'd strongly consider something more sophisticated to avoid running into problems in the event that your paths contain the sequence "_R" elsewhere. It's fairly easy to protect yourself against that problem, it's good general practice, and it will most likely avoid bugs in the future.
Here is a very basic working example:
#include <iostream>
#include <string>
#include <vector>
#include <boost/regex.hpp>
std::vector<std::string> findMono(const std::string& path)
{
boost::regex rgx("_R");
boost::sregex_iterator begin {path.begin(), path.end(), rgx}, end {};
std::vector<std::string> matches;
for (boost::sregex_iterator& i = begin; i != end; ++i) {
matches.push_back((*i).str());
}
return matches;
}
int main(int argc, char * argv[])
{
const std::string path = "E:\\501_Document_60_1_R.xml";
const std::vector<std::string>& matches = findMono(path);
for (const auto& match : matches) {
std::cout << match << std::endl;
}
return 0;
}

Related

Why Boost:Regex not found all results as expected?

I have a c++ sample and i want to find all queries inside a relative uri
(like: /class?class_id=-1&course_ref=1&student_ref=2&score_ref=1). If it works, i would find all results: ( "class_id=-1" "course_ref=1" "student_ref=2" "score_ref=1: ) but only "course_ref=1" was found! Here's my code:
#include <iostream>
#include <boost/regex.hpp>
int main() {
std::string url = "/class?class_id=-1&course_ref=1&student_ref=2&score_ref=1";
const boost::regex queries_pattern("(?<=(\?|\&))[a-zA-Z0-9_=-]+");
boost::smatch queries_result;
boost::regex_search(url, queries_result, queries_pattern);
std::string results("");
for (unsigned int i = 0; i <= queries_result.size(); i++) {
if (!queries_result[i].str().empty())
std::cout << queries_result[i] << std::endl;
}
std::cin.get();
}
I also tried other regex patterns (without look-behind) but non of them worked. Also i tested std::regex and Boost:Xpressive and no result extracted.
Does anyone knows why this fails?? Or there's a different solution? Thanks.
I don't know why but i must loop on iterator not directly on results. Here's the worked code:
#include <iostream>
#include <boost/regex.hpp>
int main() {
std::string url = "/class?class_id=-1&course_ref=1&student_ref=2&score_ref=1";
const boost::regex pattern("[a-zA-Z0-9_=-]+((?=&)|(?=$))");
boost::sregex_token_iterator iter(url.begin(), url.end(), pattern, 0);
boost::sregex_token_iterator end;
for (; iter != end; ++iter) {
std::cout << *iter << '\n';
}
std::cin.get();
}
Thank you "the fourth bird" for your correct point.

regex_iterator not matching groups in regular expression

How to extract Test and Again from string s in below code.
Currently I am using regex_iterator and it doesn't seems to be matching groups in regular expression and I am getting {{Test}} and {{Again}} in output.
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
return 0;
}
I also tried using regex_search but it is not working with multiple patterns and only giving Test ouput
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
if (std::regex_search(s, match, rgx,std::regex_constants::match_any))
{
std::cout<<"Match size is "<<match.size()<<std::endl;
for(auto elem:match)
std::cout << "match: " << elem << '\n';
}
}
Also as a side note why two backslashes are needed to escape { or }
To access the contents of the capturing group you need to use .str(1):
std::cout << match.str(1) << std::endl;
See the C++ demo:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
// std::regex rgx("\\{\\{(\\w+)\\}\\}");
// Better, use a raw string literal:
std::regex rgx(R"(\{\{(\w+)\}\})");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << std::endl;
next++;
}
return 0;
}
Output:
Test
Again
Note you do not have to use double backslashes to define a regex escape sequence inside raw string literals (here, R"(pattern_here)").

How to check for success in c++11 std::regex_replace?

I'd like to do the c++11 equivalent of a perl checked-replacement operation:
my $v = "foo.rat"
if ( $v =~ s/\.rat$/.csv/ )
{
...
}
I can do the replacement without trouble:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string s{ "foo.rat" } ;
std::regex reg{ R"((.*)\.rat$)" } ;
s = std::regex_replace( s, reg, "$1.csv" ) ;
std::cout << s << std::endl ;
s = std::regex_replace( "foo.noo", reg, "$1.csv" ) ;
std::cout << s << std::endl ;
return 0 ;
}
This gives:
foo.csv
foo.noo
Notice that the replace operation on the non-matching expression doesn't throw an error (which is what I expected).
Looking at the regex_replace documentation, it's not obvious to me how to check for the success of the replace operation. I could do a string compare, but that seems backwards?
Try to find match with std::regex_match or std::regex_search, check if something is matched, then replace found portion of string using std::string::replace. That shouldn't lead to performance loss.
Just to add to the accepted answer that it can also be done with a std::regex_iterator. This may be handy when multiple replacements may took place.
Iterator std::regex_iterator repeatedly calls std::regex_search() until all matches are found. If the position of the iterator at the beginning and the position at the end are the same, no match was found.
Function bool regex_replace(std::string &str, const std::string &re, const std::string& replacement) implements this behaviour:
#include <iostream>
#include <regex>
bool regex_replace(std::string &str, const std::string &re, const std::string& replacement) {
std::regex regexp(re);
//Search regex
std::sregex_iterator begin = std::sregex_iterator(str.begin(), str.end(), regexp);
std::sregex_iterator end = std::sregex_iterator();
//replace using iterator
for (std::sregex_iterator i = begin; i != end; ++i)
str.replace(i->position(), i->length(), replacement);
//returns true if at least one match was found and replaced
return (begin != end);
}
This function operates in place. At the end str have the replacements. Only if any replacement was made, the function returns true.
Following code shows how to use it to make multiple replacements and detect if any was made:
int main(int argc, char** argv) {
std::string rgx("[0-9]");
std::string str("0a1b2c3d4e5");
std::string replacement("?");
bool found = regex_replace(str, rgx, replacement);
std::cout << "Found any: " << (found ? "true" : "false") << std::endl;
std::cout << "string: " << str << std::endl;
return 0;
}
The code substitutes every digit for the quotation mark '?':
Found any: true
string: ?a?b?c?d?e?
use std::regex_constants::format_no_copy flag to change the behavior of regex_replace(). look the code below.
the return string will now be empty if match failed.
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string s{ "foo.rat" } ;
std::regex reg{ R"((.*)\.rat$)" } ;
auto rxMatchFlag = std::regex_constants::format_no_copy; //<---use this to modify the behavior of regex_replace when matching failed.
s = std::regex_replace( s, reg, "$1.csv", rxMatchFlag) ;
if(!s.empty()) std::cout << s << std::endl ;
else std::cout << "failed match" << std::endl;
s = std::regex_replace( "foo.noo", reg, "$1.csv", rxMatchFlag) ;
if(!s.empty()) std::cout << s << std::endl ;
else std::cout << "failed match" << std::endl;
return 0 ;
}
for the other flags, look them here
I don't believe there's any direct way to find out whether any replacements were made.
(Don't confuse this with "success / not success", which is not quite the same thing.)

most efficient way to parse a string using c++ features

this might be a stupid question (I hope not) but it caught my mind and I'm trying to figure it out. What is the most efficient way to parse a string using c++ features?
I appreciate everyone's comments as I, am I'm sure everyone else is too, to become a better programmer!
Here is how I would do it right now with my current knowledge:
#include <iostream>
#include <string>
using std::cout;
using std::string;
using std::endl;
void parseLine(string &line)
{
constexpr char DELIMITER_ONE = '|';
constexpr char DELIMITER_TWO = '[';
for (int i = 0; i < line.length(); i++)
{
if (line[i] == DELIMITER_ONE || line[i] == DELIMITER_TWO)
{
line.erase(i, 1);
}
}
cout << line << endl;
}
int main()
{
std::string testString = "H|el[l|o|";
parseLine(testString);
system("pause");
return 0;
}
line.erase(
std::remove_if(line.begin(), line.end(),
[](char c) { return c == DELIMITER_ONE || c == DELIMITER_TWO; }
),
line.end()
);
See also: erase-remove idiom
One more way is to use the boost regex library. Check the below code:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main(){
std::string testString = "H|el[l|o|";
boost::regex rx("\\||\\[");
std::string replace = "";
std::string out = boost::regex_replace(testString, rx, replace);
std::cout << out << std::endl;
}
C++14 now includes regular expressions standard:
#include <iostream>
#include <string>
#include <regex>
std::string parseLine(const std::string& line);
int main() {
std::string testString = "H|el[l|o|";
std::string result = parseLine(testString);
std::cout << result << std::endl;
system("pause");
return 0;
}
std::string parseLine(const std::string& line) {
std::string input_string;
std::string result;
std::smatch sm;
std::regex r("([a-zA-Z]+)");
for(input_string = line; std::regex_search(input_string, sm, r); input_string = sm.suffix()) {
result.append(sm[0].str());
}
return result;
}

Different behavior in C regex VS C++11 regex

I need a code that splits math-notation permutations into its elements, lets suppose this permutation:
The permutation string will be:
"(1,2,5)(3,4)" or "(3,4)(1,2,5)" or "(3,4)(5,1,2)"
The patterns i've tried are this:
([0-9]+[ ]*,[ ]*)*[0-9]+ for each permutation cycle. This would split the "(1,2,5)(3,4)" string in two strings "1,2,5" and "3,4".
([0-9]+) for each element in cycle. This would split each cycle in individual numbers.
When i've tried this patterns in this page they work well. And also, i've used them with the C++11 regex library with good results:
#include <iostream>
#include <string>
#include <regex>
void elements(const std::string &input)
{
const std::regex ElementRegEx("[0-9]+");
for (std::sregex_iterator Element(input.begin(), input.end(), ElementRegEx); Element != std::sregex_iterator(); ++Element)
{
const std::string CurrentElement(*Element->begin());
std::cout << '\t' << CurrentElement << '\n';
}
}
void cycles(const std::string &input)
{
const std::regex CycleRegEx("([0-9]+[ ]*,[ ]*)*[0-9]+");
for (std::sregex_iterator Cycle(input.begin(), input.end(), CycleRegEx); Cycle != std::sregex_iterator(); ++Cycle)
{
const std::string CurrentCycle(*Cycle->begin());
std::cout << CurrentCycle << '\n';
elements(CurrentCycle);
}
}
int main(int argc, char **argv)
{
std::string input("(1,2,5)(3,4)");
std::cout << "input: " << input << "\n\n";
cycles(input);
return 0;
}
The Output compiling with Visual Studio 2010 (10.0):
input: (1,2,5)(3,4)
1,2,5
1
2
5
3,4
3
4
But unfortunately, i cannot use the C++11 tools on my project, the project will run under a Linux plataform and it must be compiled with gcc 4.2.3; so i'm forced to use the C regex library in the regex.h header. So, using the same patterns but with different library i'm getting different results:
Here is the test code:
void elements(const std::string &input)
{
regex_t ElementRegEx;
regcomp(&ElementRegEx, "([0-9]+)", REG_EXTENDED);
regmatch_t ElementMatches[MAX_MATCHES];
if (!regexec(&ElementRegEx, input.c_str(), MAX_MATCHES, ElementMatches, 0))
{
int Element = 0;
while ((ElementMatches[Element].rm_so != -1) && (ElementMatches[Element].rm_eo != -1))
{
regmatch_t &ElementMatch = ElementMatches[Element];
std::stringstream CurrentElement(input.substr(ElementMatch.rm_so, ElementMatch.rm_eo - ElementMatch.rm_so));
std::cout << '\t' << CurrentElement << '\n';
++Element;
}
}
regfree(&ElementRegEx);
}
void cycles(const std::string &input)
{
regex_t CycleRegEx;
regcomp(&CycleRegEx, "([0-9]+[ ]*,[ ]*)*[0-9]+", REG_EXTENDED);
regmatch_t CycleMatches[MAX_MATCHES];
if (!regexec(&CycleRegEx, input.c_str(), MAX_MATCHES, CycleMatches, 0))
{
int Cycle = 0;
while ((CycleMatches[Cycle].rm_so != -1) && (CycleMatches[Cycle].rm_eo != -1))
{
regmatch_t &CycleMatch = CycleMatches[Cycle];
const std::string CurrentCycle(input.substr(CycleMatch.rm_so, CycleMatch.rm_eo - CycleMatch.rm_so));
std::cout << CurrentCycle << '\n';
elements(CurrentCycle);
++Cycle;
}
}
regfree(&CycleRegEx);
}
int main(int argc, char **argv)
{
cycles("(1,2,5)(3,4)")
return 0;
}
The expected output is the same as using C++11 regex, but the real ouput was:
input: (1,2,5)(3,4)
1,2,5
1
1
2,
2
2
Finally, the questions are:
Could someone give me a hint about where i'm misunderstanding the C regex engine?
Why the behavior is different in the C regex vs the C++ regex?
You're misunderstanding the output of regexec. The pmatch buffer (after pmatch[0]) is filled with sub-matches of the regex, not with consecutive matches in the string.
For example, if your regex is [a-z]([+ ])([0-9]) matched against x+5, then pmatch[0] will reference x+5 (the whole match), and pmatch[1] and pmatch[2] will reference + and 5 respectively.
You need to repeat the regexec in a loop, starting from the end of the previous match:
int start = 0;
while (!regexec(&ElementRegEx, input.c_str() + start, MAX_MATCHES, ElementMatches, 0))
{
regmatch_t &ElementMatch = ElementMatches[0];
std::string CurrentElement(input.substr(start + ElementMatch.rm_so, ElementMatch.rm_eo - ElementMatch.rm_so));
std::cout << '\t' << CurrentElement << '\n';
start += ElementMatch.rm_eo;
}