c++11 regex - finding all matches between two slash chars - regex

Lets assume I have a string like: String1/String2/String3/String4
I'd like to use regex to find every matching between slash characters + everything after the last / character. so the output would be: String2 , String3 , String4
smatch match_str;
regex re_str("\\/(.*)");
regex_match( s, match_str, re_str );
cout << match_str[1] << endl;
cout << match_str[2] << endl;
cout << match_str[3] << endl;

Note that regex_match requires a full string match. Also, .* matches 0 or more characters other than a newline, as many as possible (that is, it matches until the very end of the given line).
Also, / symbol in a C++ regex does not need to be escaped.
Here is a working code:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex r("[^/]+");
std::smatch m;
std::string s = "String1/String2/String3/String4";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i )
{
std::smatch m = *i;
std::cout << m[0] << '\n';
}
return 0;
}
See IDEONE demo
Results:
String1
String2
String3
String4
If you need to specify the initial boundary, use
std::regex rex1("(?:^|/)([^/]+)");
The values will be inside m[1] then, rather than in m[0]. See another demo.

You can use this one:
\\/([^\\/])*
With a live example

Here is a way to do it using string iterators (untested).
std::string strInput = "String1/String2/String3/String4";
std::string::const_iterator start = strInput.begin();
std::string::const_iterator end = strInput.end();
std::smatch _M;
std::regex Rx( "/([^/]*)" );
while ( regex_search( start, end, _M, Rx ) )
{
std::string strSubDir = _M[1].str(); // Do something with subdir
cout << strSubDir.c_str() << endl; // Debug print subdir
start = _M[0].second; // Advance the start iterator past this match
}
Output:
String2
String3
String4

Related

Regex expression for matching $(..)

Suppose I have a string that looks like this
std::string str = "I have $(xb) books and $(cs) pens";
What is the easiest way to extract xb and cs (extract characters surounded by $() ) from the statement ?
Is it possible to do some regex magic that would return a vector that contains xb and cs ?
I am trying something like this
regex rgx("$\\(*\\)");//$(*)
smatch result;
regex_search(var, result, rgx);
for(size_t i=0; i<result.size(); ++i){
cout << result[i] << endl;
}
However I am not getting any success . What I would like is xb and cs in a vector. I tried coming up with an expression but i cant figure it out
$ has a special meaning and * applies to the opening brace in your regex.
You could use a capturing group and match starting at the char after the opening brace that's not a closing brace in the brackets, i.e. use the regex \$\(([^)]*)\):
std::string str = "I have $(xb) books and $(cs) pens";
std::regex rgx("\\$\\(([^)]*)\\)");
for (auto pos = std::sregex_iterator(str.begin(), str.end(), rgx), end = std::sregex_iterator();
pos != end; ++pos) {
const std::smatch& match = *pos;
std::cout << match[1] << std::endl;
}

Make dollar and caret only match at beginning/end of string, not before/after embedded newlines

This little code which follows outputs
<hello>
<world>
demonstrating that ^ and $ are also matching after and before of \n respectively. How can I change this behavior, and have them only match at beginning and end of string? (In this case, there would be no match in the example str input.)
#include <boost/regex.hpp>
#include <iostream>
int main() {
std::string tokenRegex = "^[^\n\r]+$";
std::string str = "hello\nworld";
boost::sregex_iterator rit{std::begin(str), std::end(str), boost::regex{tokenRegex}};
boost::sregex_iterator end;
while (rit != end) {
std::cout << '<' << rit->str() << '>' << '\n';
++rit;
}
}
You need to use the match_single_line flag:
boost::sregex_iterator rit{
std::begin(str),
std::end(str),
boost::regex{tokenRegex},
boost::match_single_line // <-- here
};
This is a match flag - you specify it when matching (or constructing an iterator which matches), not when compiling the regex.

Add to a stack with a regex

I'm learning regex and C++ and I want to make a postfix expression. To do this I want to separate my string like this :
String : 56*((6+2)/(8-7)* 2^3)
List : 56 | * | ( | (| 6 | + | 2 | ) | / | ( ....
Actually I have :
void Stack::findIT() {
std::string var = "56*((6+2)/(8-7)* 2^3)";
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r("([0-9]*|[+*\\-\\/%]|[()])");
std::smatch m;
std::regex_search(str, m, r);
for(auto v: m) std::cout << v << std::endl;
}
I want to pill a :
std::stack _operators;
std::stack _operands;
And to do this extract the string with the Regex
But why when I use this code I have empty string ?
With std::regex r("([0-9]*|[+*\\-\\/%]|[()])");, empty string is matched.
You probably want: "[0-9]+|[+*/%^()-]"
You also have to iterate for your search:
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r("[0-9]+|[+*/%^()-]");
std::smatch m;
while (std::regex_search(str, m, r)) {
std::cout << m[0] << std::endl;
str = m.suffix();
}
Demo
To collect all the matches in a string you should really be using std::sregex_iterator.
I would recomend you use raw string literals so you don't have to worry about escaping in your regex: R"~()~" (expression goes in the middle, no escapes required).
I changed you regex slightly. In character sets you have to put - at the beginning or end (otherwise it's a range separator). I also added the ability to read decimal numbers.
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r(R"~(\d+(:?\.\d+)?|[-+*\/%^]|[()])~");
std::sregex_iterator m_end;
std::sregex_iterator m(std::begin(str), std::end(str), r);
for(; m != m_end; ++m)
std::cout << m->str() << std::endl;

regex_match doesn't find any matching

I tested the RegEx [BCGRYWbcgryw]{4}\[\d\] on that site and it seems ok to find match in the following BBCC[0].GGRY[0].WWWW[soln]. It match with BBCC[0] and GGRY[0].
But when I tried to code and debug that matching, the smvalue stay empty.
regex r("[BCGRYWbcgryw]{4}\\[\\d\\]");
string line; in >> line;
smatch sm;
regex_match(line, sm, r, regex_constants::match_any);
copy(boost::begin(sm), boost::end(sm), ostream_iterator<smatch::value_type>(cout, ", "));
Where am I wrong ?
If you don't want to match the whole input sequence then use std::regex_search not std::regex_match
#include <iostream>
#include <regex>
#include <iterator>
#include <algorithm>
int main()
{
using namespace std;
regex r(R"([BCGRYWbcgryw]{4}\[\d\])");
string line = "BBCC[0].GGRY[0].WWWW[soln]";
smatch sm;
regex_search(line, sm, r, regex_constants::match_any);
copy(std::begin(sm), std::end(sm), ostream_iterator<smatch::value_type>(cout, ", "));
cout << endl;
}
N.B. this also uses raw strings to simplify the regular expression.
I finally get it work, with () to define a capture group and I use regex_iterator to find all sub-string matching the pattern.
std::regex rstd("(\\[[0-9]\\]\.[BCGRYWbcgryw]{4})");
std::sregex_iterator stIterstd(line.begin(), line.end(), rstd);
std::sregex_iterator endIterstd;
for (stIterstd; stIterstd != endIterstd; ++stIterstd)
{
cout << " Whole string " << (*stIterstd)[0] << endl;
cout << " First sub-group " << (*stIterstd)[1] << endl;
}
The output is:
Whole string [0].GGRY
First sub-group [0].GGRY
Whole string [0].WWWW
First sub-group [0].WWWW

Extracting submatches using boost regex in c++

I'm trying to extract submatches from a text file using boost regex. Currently I'm only returning the first valid line and the full line instead of the valid email address. I tried using the iterator and using submatches but I wasn't having success with it. Here is the current code:
if(Myfile.is_open()) {
boost::regex pattern("^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$");
while(getline(Myfile, line)) {
string::const_iterator start = line.begin();
string::const_iterator end = line.end();
boost::sregex_token_iterator i(start, end, pattern);
boost::sregex_token_iterator j;
while ( i != j) {
cout << *i++ << endl;
}
Myfile.close();
}
Use boost::smatch.
boost::regex pattern("what(ever) ...");
boost::smatch result;
if (boost::regex_search(s, result, pattern)) {
string submatch(result[1].first, result[1].second);
// Do whatever ...
}
const string pattern = "(abc)(def)";
const string target = "abcdef";
boost::regex regexPattern(pattern, boost::regex::extended);
boost::smatch what;
bool isMatchFound = boost::regex_match(target, what, regexPattern);
if (isMatchFound)
{
for (unsigned int i=0; i < what.size(); i++)
{
cout << "WHAT " << i << " " << what[i] << endl;
}
}
The output is the following
WHAT 0 abcdef
WHAT 1 abc
WHAT 2 def
Boost uses parenthesized submatches, and the first submatch is always the full matched string. regex_match has to match the entire line of input against the pattern, if you are trying to match a substring, use regex_search instead.
The example I used above uses the posix extended regex syntax, which is specified using the boost::regex::extended parameter. Omitting that parameter changes the syntax to use perl style regex syntax. Other regex syntax is available.
This line:
string submatch(result[1].first, result[1].second);
causes errors in visual c++ (I tested against 2012, but expect earlier version do, too)
See https://groups.google.com/forum/?fromgroups#!topic/cpp-netlib/0Szv2WcgAtc for analysis.
The most simplest way to convert boost::sub_match to std::string :
boost::smatch result;
// regex_search or regex_match ...
string s = result[1];