Extracting data from an std::string using regex - c++

Similar to Parse comma-separated ints/int-ranges in C++,
I want a regex to extract edges from a string: (1,2,1) (2,4,5) (1,4,3) (3,4,10) (3,6,2) (3,5,3) (6,7,6) (4,7,4) where (Node1 number, Node2 number, distance).
I currently am using: std::regex reg_edge("\(.*?\,.*?\,.*?\)"); which does not work (as in not a single match is found).
Since this can also be an XY-Problem, I will state what I want to do: I want the user to enter edges of the graph when creating the graph.
Please suggest a correct regex, or maybe, a better way altogether.
My current code:
void Graph::setEdges() {
std::string edge_str;
std::getline(std::cin, edge_str);
std::istringstream iss(edge_str);
edge_str.clear();
while (iss >> edge_str) {
std::regex reg_edge("\(.*?\,.*?\,.*?\,\)");
auto reg_begin = std::sregex_iterator(edge_str.begin(), edge_str.end(), reg_edge);
auto reg_end = std::sregex_iterator();
for (std::sregex_iterator reg_it = reg_begin; reg_it != reg_end; reg_it++) {
std::smatch it_match = *reg_it;
}
}
}

You can use the regex \((\d+),(\d+),(\d+)\) with std::sregex_iterator. Note that you have to escape ( and ) to match against them literally. ALso, using a raw literal string makes it easier with regexes.
Then extract each matching group using operator[]. Group 0 is always the whole group, so you want groups 1, 2, and 3 in your case.
std::regex reg(R"(\((\d+),(\d+),(\d+)\))");
std::string str = "(1,2,1) (2,4,5) (1,4,3) (3,4,10) (3,6,2) (3,5,3) (6,7,6) (4,7,4)";
auto start = std::sregex_iterator(str.begin(), str.end(), reg);
auto end = std::sregex_iterator{};
for (std::sregex_iterator it = start; it != end; ++it)
{
std::cout << "Node1 = " << (*it)[1] << ", Node2 = " << (*it)[2]
<< ", Distance = " << (*it)[3] << std::endl;
}
Here's a demo.

Related

Regex expression for matching $(..)

Suppose I have a string that looks like this
std::string str = "I have $(xb) books and $(cs) pens";
What is the easiest way to extract xb and cs (extract characters surounded by $() ) from the statement ?
Is it possible to do some regex magic that would return a vector that contains xb and cs ?
I am trying something like this
regex rgx("$\\(*\\)");//$(*)
smatch result;
regex_search(var, result, rgx);
for(size_t i=0; i<result.size(); ++i){
cout << result[i] << endl;
}
However I am not getting any success . What I would like is xb and cs in a vector. I tried coming up with an expression but i cant figure it out
$ has a special meaning and * applies to the opening brace in your regex.
You could use a capturing group and match starting at the char after the opening brace that's not a closing brace in the brackets, i.e. use the regex \$\(([^)]*)\):
std::string str = "I have $(xb) books and $(cs) pens";
std::regex rgx("\\$\\(([^)]*)\\)");
for (auto pos = std::sregex_iterator(str.begin(), str.end(), rgx), end = std::sregex_iterator();
pos != end; ++pos) {
const std::smatch& match = *pos;
std::cout << match[1] << std::endl;
}

Add to a stack with a regex

I'm learning regex and C++ and I want to make a postfix expression. To do this I want to separate my string like this :
String : 56*((6+2)/(8-7)* 2^3)
List : 56 | * | ( | (| 6 | + | 2 | ) | / | ( ....
Actually I have :
void Stack::findIT() {
std::string var = "56*((6+2)/(8-7)* 2^3)";
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r("([0-9]*|[+*\\-\\/%]|[()])");
std::smatch m;
std::regex_search(str, m, r);
for(auto v: m) std::cout << v << std::endl;
}
I want to pill a :
std::stack _operators;
std::stack _operands;
And to do this extract the string with the Regex
But why when I use this code I have empty string ?
With std::regex r("([0-9]*|[+*\\-\\/%]|[()])");, empty string is matched.
You probably want: "[0-9]+|[+*/%^()-]"
You also have to iterate for your search:
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r("[0-9]+|[+*/%^()-]");
std::smatch m;
while (std::regex_search(str, m, r)) {
std::cout << m[0] << std::endl;
str = m.suffix();
}
Demo
To collect all the matches in a string you should really be using std::sregex_iterator.
I would recomend you use raw string literals so you don't have to worry about escaping in your regex: R"~()~" (expression goes in the middle, no escapes required).
I changed you regex slightly. In character sets you have to put - at the beginning or end (otherwise it's a range separator). I also added the ability to read decimal numbers.
std::string str("56*((6+2)/(8-7)* 2^3)");
std::regex r(R"~(\d+(:?\.\d+)?|[-+*\/%^]|[()])~");
std::sregex_iterator m_end;
std::sregex_iterator m(std::begin(str), std::end(str), r);
for(; m != m_end; ++m)
std::cout << m->str() << std::endl;

C++ Compare and replace last character of stringstream

I would like to check the following:
If the last character appended to the stringstream is a comma.
If it is remove it.
std::stringstream str;
str << "["
//loop which adds several strings separated by commas
str.seekp(-1, str.cur); // this is to remove the last comma before closing bracket
str<< "]";
The problem is if nothing is added in the loop, the opening bracket is removed from the string. So I need a way to check whether the last character is a comma. I did that like this:
if (str.str().substr(str.str().length() - 1) == ",")
{
str.seekp(-1, rteStr.cur);
}
But I don't feel very good about this. Is there a better way to do this?
About the loop:
Loop is used to tokenize a set of commands received through sockets and format it to send to another program through another socket. Each command ends with an OVER flag.
std::regex tok_pat("[^\\[\\\",\\]]+");
std::sregex_token_iterator tok_it(command.begin(), command.end(), tok_pat);
std::sregex_token_iterator tok_end;
std::string baseStr = tok_end == ++tok_it ? "" : std::string(*tok_it);
while (baseStr == "OVER")
{
//extract command parameters
str << "extracted_parameters" << ","
}
The way I often deal with these loops where you want to put something like a space or a comma between a list of items is like this:
int main()
{
// initially the separator is empty
auto sep = "";
for(int i = 0; i < 5; ++i)
{
std::cout << sep << i;
sep = ", "; // make the separator a comma after first item
}
}
Output:
0, 1, 2, 3, 4
If you want to make it more speed efficient you can output the first item using an if() before entering the loop to output the rest of the items like this:
int main()
{
int n;
std::cin >> n;
int i = 0;
if(i < n) // check for no output
std::cout << i;
for(++i; i < n; ++i) // rest of the output (if any)
std::cout << ", " << i; // separate these
}
In your situation the first solution could work like this:
std::regex tok_pat("[^\\[\\\",\\]]+");
std::sregex_token_iterator tok_it(command.begin(), command.end(), tok_pat);
std::sregex_token_iterator tok_end;
std::string baseStr = tok_end == ++tok_it ? "" : std::string(*tok_it);
auto sep = ""; // empty separator for first item
while (baseStr == "OVER")
{
// extract command parameters
str << sep << "extracted_parameters";
sep = ","; // make it a comma after first item
}
And the second (possibly more time efficient) solution:
std::regex tok_pat("[^\\[\\\",\\]]+");
std::sregex_token_iterator tok_it(command.begin(), command.end(), tok_pat);
std::sregex_token_iterator tok_end;
std::string baseStr = tok_end == ++tok_it ? "" : std::string(*tok_it);
if (baseStr == "OVER")
{
// extract command parameters
str << "extracted_parameters";
}
while (baseStr == "OVER")
{
// extract command parameters
str << "," << "extracted_parameters"; // add a comma after first item
}

c++11 regex - finding all matches between two slash chars

Lets assume I have a string like: String1/String2/String3/String4
I'd like to use regex to find every matching between slash characters + everything after the last / character. so the output would be: String2 , String3 , String4
smatch match_str;
regex re_str("\\/(.*)");
regex_match( s, match_str, re_str );
cout << match_str[1] << endl;
cout << match_str[2] << endl;
cout << match_str[3] << endl;
Note that regex_match requires a full string match. Also, .* matches 0 or more characters other than a newline, as many as possible (that is, it matches until the very end of the given line).
Also, / symbol in a C++ regex does not need to be escaped.
Here is a working code:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex r("[^/]+");
std::smatch m;
std::string s = "String1/String2/String3/String4";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i )
{
std::smatch m = *i;
std::cout << m[0] << '\n';
}
return 0;
}
See IDEONE demo
Results:
String1
String2
String3
String4
If you need to specify the initial boundary, use
std::regex rex1("(?:^|/)([^/]+)");
The values will be inside m[1] then, rather than in m[0]. See another demo.
You can use this one:
\\/([^\\/])*
With a live example
Here is a way to do it using string iterators (untested).
std::string strInput = "String1/String2/String3/String4";
std::string::const_iterator start = strInput.begin();
std::string::const_iterator end = strInput.end();
std::smatch _M;
std::regex Rx( "/([^/]*)" );
while ( regex_search( start, end, _M, Rx ) )
{
std::string strSubDir = _M[1].str(); // Do something with subdir
cout << strSubDir.c_str() << endl; // Debug print subdir
start = _M[0].second; // Advance the start iterator past this match
}
Output:
String2
String3
String4

Extracting submatches using boost regex in c++

I'm trying to extract submatches from a text file using boost regex. Currently I'm only returning the first valid line and the full line instead of the valid email address. I tried using the iterator and using submatches but I wasn't having success with it. Here is the current code:
if(Myfile.is_open()) {
boost::regex pattern("^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$");
while(getline(Myfile, line)) {
string::const_iterator start = line.begin();
string::const_iterator end = line.end();
boost::sregex_token_iterator i(start, end, pattern);
boost::sregex_token_iterator j;
while ( i != j) {
cout << *i++ << endl;
}
Myfile.close();
}
Use boost::smatch.
boost::regex pattern("what(ever) ...");
boost::smatch result;
if (boost::regex_search(s, result, pattern)) {
string submatch(result[1].first, result[1].second);
// Do whatever ...
}
const string pattern = "(abc)(def)";
const string target = "abcdef";
boost::regex regexPattern(pattern, boost::regex::extended);
boost::smatch what;
bool isMatchFound = boost::regex_match(target, what, regexPattern);
if (isMatchFound)
{
for (unsigned int i=0; i < what.size(); i++)
{
cout << "WHAT " << i << " " << what[i] << endl;
}
}
The output is the following
WHAT 0 abcdef
WHAT 1 abc
WHAT 2 def
Boost uses parenthesized submatches, and the first submatch is always the full matched string. regex_match has to match the entire line of input against the pattern, if you are trying to match a substring, use regex_search instead.
The example I used above uses the posix extended regex syntax, which is specified using the boost::regex::extended parameter. Omitting that parameter changes the syntax to use perl style regex syntax. Other regex syntax is available.
This line:
string submatch(result[1].first, result[1].second);
causes errors in visual c++ (I tested against 2012, but expect earlier version do, too)
See https://groups.google.com/forum/?fromgroups#!topic/cpp-netlib/0Szv2WcgAtc for analysis.
The most simplest way to convert boost::sub_match to std::string :
boost::smatch result;
// regex_search or regex_match ...
string s = result[1];