Suppose I have a string that looks like this
std::string str = "I have $(xb) books and $(cs) pens";
What is the easiest way to extract xb and cs (extract characters surounded by $() ) from the statement ?
Is it possible to do some regex magic that would return a vector that contains xb and cs ?
I am trying something like this
regex rgx("$\\(*\\)");//$(*)
smatch result;
regex_search(var, result, rgx);
for(size_t i=0; i<result.size(); ++i){
cout << result[i] << endl;
}
However I am not getting any success . What I would like is xb and cs in a vector. I tried coming up with an expression but i cant figure it out
$ has a special meaning and * applies to the opening brace in your regex.
You could use a capturing group and match starting at the char after the opening brace that's not a closing brace in the brackets, i.e. use the regex \$\(([^)]*)\):
std::string str = "I have $(xb) books and $(cs) pens";
std::regex rgx("\\$\\(([^)]*)\\)");
for (auto pos = std::sregex_iterator(str.begin(), str.end(), rgx), end = std::sregex_iterator();
pos != end; ++pos) {
const std::smatch& match = *pos;
std::cout << match[1] << std::endl;
}
Related
Similar to Parse comma-separated ints/int-ranges in C++,
I want a regex to extract edges from a string: (1,2,1) (2,4,5) (1,4,3) (3,4,10) (3,6,2) (3,5,3) (6,7,6) (4,7,4) where (Node1 number, Node2 number, distance).
I currently am using: std::regex reg_edge("\(.*?\,.*?\,.*?\)"); which does not work (as in not a single match is found).
Since this can also be an XY-Problem, I will state what I want to do: I want the user to enter edges of the graph when creating the graph.
Please suggest a correct regex, or maybe, a better way altogether.
My current code:
void Graph::setEdges() {
std::string edge_str;
std::getline(std::cin, edge_str);
std::istringstream iss(edge_str);
edge_str.clear();
while (iss >> edge_str) {
std::regex reg_edge("\(.*?\,.*?\,.*?\,\)");
auto reg_begin = std::sregex_iterator(edge_str.begin(), edge_str.end(), reg_edge);
auto reg_end = std::sregex_iterator();
for (std::sregex_iterator reg_it = reg_begin; reg_it != reg_end; reg_it++) {
std::smatch it_match = *reg_it;
}
}
}
You can use the regex \((\d+),(\d+),(\d+)\) with std::sregex_iterator. Note that you have to escape ( and ) to match against them literally. ALso, using a raw literal string makes it easier with regexes.
Then extract each matching group using operator[]. Group 0 is always the whole group, so you want groups 1, 2, and 3 in your case.
std::regex reg(R"(\((\d+),(\d+),(\d+)\))");
std::string str = "(1,2,1) (2,4,5) (1,4,3) (3,4,10) (3,6,2) (3,5,3) (6,7,6) (4,7,4)";
auto start = std::sregex_iterator(str.begin(), str.end(), reg);
auto end = std::sregex_iterator{};
for (std::sregex_iterator it = start; it != end; ++it)
{
std::cout << "Node1 = " << (*it)[1] << ", Node2 = " << (*it)[2]
<< ", Distance = " << (*it)[3] << std::endl;
}
Here's a demo.
I'm learning regex and C++ and I want to make a postfix expression. To do this I want to separate my string like this :
String : 56*((6+2)/(8-7)* 2^3)
List : 56 | * | ( | (| 6 | + | 2 | ) | / | ( ....
Actually I have :
void Stack::findIT() {
std::string var = "56*((6+2)/(8-7)* 2^3)";
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r("([0-9]*|[+*\\-\\/%]|[()])");
std::smatch m;
std::regex_search(str, m, r);
for(auto v: m) std::cout << v << std::endl;
}
I want to pill a :
std::stack _operators;
std::stack _operands;
And to do this extract the string with the Regex
But why when I use this code I have empty string ?
With std::regex r("([0-9]*|[+*\\-\\/%]|[()])");, empty string is matched.
You probably want: "[0-9]+|[+*/%^()-]"
You also have to iterate for your search:
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r("[0-9]+|[+*/%^()-]");
std::smatch m;
while (std::regex_search(str, m, r)) {
std::cout << m[0] << std::endl;
str = m.suffix();
}
Demo
To collect all the matches in a string you should really be using std::sregex_iterator.
I would recomend you use raw string literals so you don't have to worry about escaping in your regex: R"~()~" (expression goes in the middle, no escapes required).
I changed you regex slightly. In character sets you have to put - at the beginning or end (otherwise it's a range separator). I also added the ability to read decimal numbers.
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r(R"~(\d+(:?\.\d+)?|[-+*\/%^]|[()])~");
std::sregex_iterator m_end;
std::sregex_iterator m(std::begin(str), std::end(str), r);
for(; m != m_end; ++m)
std::cout << m->str() << std::endl;
How can I get the name of the group corresponding to the pattern match using Boost regular expressions?
The following will output the matched expression to the given pattern. But how can I get the corresponding named group?
boost::regex pattern("(?<alpha>[0-9]*\\.?[0-9]+)|(?<beta>[a-zA-Z_]+)");
string s = "67.2 x 7 I am";
string::const_iterator start = s.begin();
string::const_iterator end = s.end();
boost::sregex_token_iterator i(start, end, pattern);
boost::sregex_token_iterator j;
for ( ;i != j; ++i)
{
cout << *i << endl;
// '67.2' and '7' belongs to "alpha"
// 'x', 'I', 'am' belongs to "beta"
}
You can get it from match_result It is for xpressive, but the same should work for Boost.Regex
Lets assume I have a string like: String1/String2/String3/String4
I'd like to use regex to find every matching between slash characters + everything after the last / character. so the output would be: String2 , String3 , String4
smatch match_str;
regex re_str("\\/(.*)");
regex_match( s, match_str, re_str );
cout << match_str[1] << endl;
cout << match_str[2] << endl;
cout << match_str[3] << endl;
Note that regex_match requires a full string match. Also, .* matches 0 or more characters other than a newline, as many as possible (that is, it matches until the very end of the given line).
Also, / symbol in a C++ regex does not need to be escaped.
Here is a working code:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex r("[^/]+");
std::smatch m;
std::string s = "String1/String2/String3/String4";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i )
{
std::smatch m = *i;
std::cout << m[0] << '\n';
}
return 0;
}
See IDEONE demo
Results:
String1
String2
String3
String4
If you need to specify the initial boundary, use
std::regex rex1("(?:^|/)([^/]+)");
The values will be inside m[1] then, rather than in m[0]. See another demo.
You can use this one:
\\/([^\\/])*
With a live example
Here is a way to do it using string iterators (untested).
std::string strInput = "String1/String2/String3/String4";
std::string::const_iterator start = strInput.begin();
std::string::const_iterator end = strInput.end();
std::smatch _M;
std::regex Rx( "/([^/]*)" );
while ( regex_search( start, end, _M, Rx ) )
{
std::string strSubDir = _M[1].str(); // Do something with subdir
cout << strSubDir.c_str() << endl; // Debug print subdir
start = _M[0].second; // Advance the start iterator past this match
}
Output:
String2
String3
String4
I'm trying to extract submatches from a text file using boost regex. Currently I'm only returning the first valid line and the full line instead of the valid email address. I tried using the iterator and using submatches but I wasn't having success with it. Here is the current code:
if(Myfile.is_open()) {
boost::regex pattern("^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$");
while(getline(Myfile, line)) {
string::const_iterator start = line.begin();
string::const_iterator end = line.end();
boost::sregex_token_iterator i(start, end, pattern);
boost::sregex_token_iterator j;
while ( i != j) {
cout << *i++ << endl;
}
Myfile.close();
}
Use boost::smatch.
boost::regex pattern("what(ever) ...");
boost::smatch result;
if (boost::regex_search(s, result, pattern)) {
string submatch(result[1].first, result[1].second);
// Do whatever ...
}
const string pattern = "(abc)(def)";
const string target = "abcdef";
boost::regex regexPattern(pattern, boost::regex::extended);
boost::smatch what;
bool isMatchFound = boost::regex_match(target, what, regexPattern);
if (isMatchFound)
{
for (unsigned int i=0; i < what.size(); i++)
{
cout << "WHAT " << i << " " << what[i] << endl;
}
}
The output is the following
WHAT 0 abcdef
WHAT 1 abc
WHAT 2 def
Boost uses parenthesized submatches, and the first submatch is always the full matched string. regex_match has to match the entire line of input against the pattern, if you are trying to match a substring, use regex_search instead.
The example I used above uses the posix extended regex syntax, which is specified using the boost::regex::extended parameter. Omitting that parameter changes the syntax to use perl style regex syntax. Other regex syntax is available.
This line:
string submatch(result[1].first, result[1].second);
causes errors in visual c++ (I tested against 2012, but expect earlier version do, too)
See https://groups.google.com/forum/?fromgroups#!topic/cpp-netlib/0Szv2WcgAtc for analysis.
The most simplest way to convert boost::sub_match to std::string :
boost::smatch result;
// regex_search or regex_match ...
string s = result[1];