Extracting submatches using boost regex in c++ - c++

I'm trying to extract submatches from a text file using boost regex. Currently I'm only returning the first valid line and the full line instead of the valid email address. I tried using the iterator and using submatches but I wasn't having success with it. Here is the current code:
if(Myfile.is_open()) {
boost::regex pattern("^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$");
while(getline(Myfile, line)) {
string::const_iterator start = line.begin();
string::const_iterator end = line.end();
boost::sregex_token_iterator i(start, end, pattern);
boost::sregex_token_iterator j;
while ( i != j) {
cout << *i++ << endl;
}
Myfile.close();
}

Use boost::smatch.
boost::regex pattern("what(ever) ...");
boost::smatch result;
if (boost::regex_search(s, result, pattern)) {
string submatch(result[1].first, result[1].second);
// Do whatever ...
}

const string pattern = "(abc)(def)";
const string target = "abcdef";
boost::regex regexPattern(pattern, boost::regex::extended);
boost::smatch what;
bool isMatchFound = boost::regex_match(target, what, regexPattern);
if (isMatchFound)
{
for (unsigned int i=0; i < what.size(); i++)
{
cout << "WHAT " << i << " " << what[i] << endl;
}
}
The output is the following
WHAT 0 abcdef
WHAT 1 abc
WHAT 2 def
Boost uses parenthesized submatches, and the first submatch is always the full matched string. regex_match has to match the entire line of input against the pattern, if you are trying to match a substring, use regex_search instead.
The example I used above uses the posix extended regex syntax, which is specified using the boost::regex::extended parameter. Omitting that parameter changes the syntax to use perl style regex syntax. Other regex syntax is available.

This line:
string submatch(result[1].first, result[1].second);
causes errors in visual c++ (I tested against 2012, but expect earlier version do, too)
See https://groups.google.com/forum/?fromgroups#!topic/cpp-netlib/0Szv2WcgAtc for analysis.

The most simplest way to convert boost::sub_match to std::string :
boost::smatch result;
// regex_search or regex_match ...
string s = result[1];

Related

Regex expression for matching $(..)

Suppose I have a string that looks like this
std::string str = "I have $(xb) books and $(cs) pens";
What is the easiest way to extract xb and cs (extract characters surounded by $() ) from the statement ?
Is it possible to do some regex magic that would return a vector that contains xb and cs ?
I am trying something like this
regex rgx("$\\(*\\)");//$(*)
smatch result;
regex_search(var, result, rgx);
for(size_t i=0; i<result.size(); ++i){
cout << result[i] << endl;
}
However I am not getting any success . What I would like is xb and cs in a vector. I tried coming up with an expression but i cant figure it out
$ has a special meaning and * applies to the opening brace in your regex.
You could use a capturing group and match starting at the char after the opening brace that's not a closing brace in the brackets, i.e. use the regex \$\(([^)]*)\):
std::string str = "I have $(xb) books and $(cs) pens";
std::regex rgx("\\$\\(([^)]*)\\)");
for (auto pos = std::sregex_iterator(str.begin(), str.end(), rgx), end = std::sregex_iterator();
pos != end; ++pos) {
const std::smatch& match = *pos;
std::cout << match[1] << std::endl;
}

Extracting data from an std::string using regex

Similar to Parse comma-separated ints/int-ranges in C++,
I want a regex to extract edges from a string: (1,2,1) (2,4,5) (1,4,3) (3,4,10) (3,6,2) (3,5,3) (6,7,6) (4,7,4) where (Node1 number, Node2 number, distance).
I currently am using: std::regex reg_edge("\(.*?\,.*?\,.*?\)"); which does not work (as in not a single match is found).
Since this can also be an XY-Problem, I will state what I want to do: I want the user to enter edges of the graph when creating the graph.
Please suggest a correct regex, or maybe, a better way altogether.
My current code:
void Graph::setEdges() {
std::string edge_str;
std::getline(std::cin, edge_str);
std::istringstream iss(edge_str);
edge_str.clear();
while (iss >> edge_str) {
std::regex reg_edge("\(.*?\,.*?\,.*?\,\)");
auto reg_begin = std::sregex_iterator(edge_str.begin(), edge_str.end(), reg_edge);
auto reg_end = std::sregex_iterator();
for (std::sregex_iterator reg_it = reg_begin; reg_it != reg_end; reg_it++) {
std::smatch it_match = *reg_it;
}
}
}
You can use the regex \((\d+),(\d+),(\d+)\) with std::sregex_iterator. Note that you have to escape ( and ) to match against them literally. ALso, using a raw literal string makes it easier with regexes.
Then extract each matching group using operator[]. Group 0 is always the whole group, so you want groups 1, 2, and 3 in your case.
std::regex reg(R"(\((\d+),(\d+),(\d+)\))");
std::string str = "(1,2,1) (2,4,5) (1,4,3) (3,4,10) (3,6,2) (3,5,3) (6,7,6) (4,7,4)";
auto start = std::sregex_iterator(str.begin(), str.end(), reg);
auto end = std::sregex_iterator{};
for (std::sregex_iterator it = start; it != end; ++it)
{
std::cout << "Node1 = " << (*it)[1] << ", Node2 = " << (*it)[2]
<< ", Distance = " << (*it)[3] << std::endl;
}
Here's a demo.

Getting a word or sub string from main string when char '\' from RHS is found and then erase rest

Suppose i have a string as below
input = " \\PATH\MYFILES This is my sting "
output = MYFILES
from RHS when first char '\' is found get the word (ie MYFILES) and erase the rest.
Below is my approach i tired but its bad because there is a Runtime error as ABORTED TERMINATED WITH A CORE.
Please suggest cleanest and/or shortest way to get only a single word (ie MYFILES ) from the above string?
I have searching and try it from last two days but no luck .please help
Note: The input string in above example is not hardcoded as it ought to be .The string contain changes dynamically but char '\' available for sure.
std::regex const r{R"~(.*[^\\]\\([^\\])+).*)~"} ;
std::string s(R"(" //PATH//MYFILES This is my sting "));
std::smatch m;
int main()
{
if(std::regex_match(s,m,r))
{
std::cout<<m[1]<<endl;
}
}
}
To erase the part of a string, you have to find where is that part begins and ends. Finding somethig inside an std::string is very easy because the class have six buit-in methods for this (std::string::find_first_of, std::string::find_last_of, etc.). Here is a small example of how your problem can be solved:
#include <iostream>
#include <string>
int main() {
std::string input { " \\PATH\\MYFILES This is my sting " };
auto pos = input.find_last_of('\\');
if(pos != std::string::npos) {
input.erase(0, pos + 1);
pos = input.find_first_of(' ');
if(pos != std::string::npos)
input.erase(pos);
}
std::cout << input << std::endl;
}
Note: watch out for escape sequences, a single backslash is written as "\\" inside a string literal.

c++11 regex - finding all matches between two slash chars

Lets assume I have a string like: String1/String2/String3/String4
I'd like to use regex to find every matching between slash characters + everything after the last / character. so the output would be: String2 , String3 , String4
smatch match_str;
regex re_str("\\/(.*)");
regex_match( s, match_str, re_str );
cout << match_str[1] << endl;
cout << match_str[2] << endl;
cout << match_str[3] << endl;
Note that regex_match requires a full string match. Also, .* matches 0 or more characters other than a newline, as many as possible (that is, it matches until the very end of the given line).
Also, / symbol in a C++ regex does not need to be escaped.
Here is a working code:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex r("[^/]+");
std::smatch m;
std::string s = "String1/String2/String3/String4";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i )
{
std::smatch m = *i;
std::cout << m[0] << '\n';
}
return 0;
}
See IDEONE demo
Results:
String1
String2
String3
String4
If you need to specify the initial boundary, use
std::regex rex1("(?:^|/)([^/]+)");
The values will be inside m[1] then, rather than in m[0]. See another demo.
You can use this one:
\\/([^\\/])*
With a live example
Here is a way to do it using string iterators (untested).
std::string strInput = "String1/String2/String3/String4";
std::string::const_iterator start = strInput.begin();
std::string::const_iterator end = strInput.end();
std::smatch _M;
std::regex Rx( "/([^/]*)" );
while ( regex_search( start, end, _M, Rx ) )
{
std::string strSubDir = _M[1].str(); // Do something with subdir
cout << strSubDir.c_str() << endl; // Debug print subdir
start = _M[0].second; // Advance the start iterator past this match
}
Output:
String2
String3
String4

Why regex_match() and regex_search() do not work?

my data file (m;w,t,w,t,w,t......,w,t) is like :
5762;895,360851.301667
5763;895,360851.301667
83495;166,360817.861111
175040156;7597,360815.840556,6905,363521.083889,774,363647.044722,20787,364348.666667,3158,364434.308333,3702,364480.726944,8965,365022.092778,1071,365043.283333,82,365544.150000,9170,365607.336667,46909,365635.057778,2165,365754.650000,895,366683.907500,121212,366689.450000,10571,366967.131944,1499,367707.580833,1790,368741.724167,7715,369115.480000
.........
and I want to find lines with (w,t) pairs occured >=7 times. I used this code:
ofstream MyTxtFile;
ifstream file("ipod-cascades.txt");
MyTxtFile.open("ipod-res.txt");
bool isWebId = true;
int n = 7,count=0;
string line;
string value;
smatch m;
while (getline(file, line)){
if (std::regex_search(line,m, std::regex(";([^,]*,[^,]*,){7,}"))){
count++;
std::stringstream linestream(line);
std::string tmp;
if (getline(linestream, value, ';')){
while (getline(linestream, tmp, ',')){
if (isWebId){
MyTxtFile << value << "," << tmp;
isWebId = false;
}
else{
MyTxtFile << "," << tmp << endl;
isWebId = true;
}
}
}
}
}
when I use 'regex_match()' it does not find any line, and when I use 'regex_search()' it finds some lines and then gives stackoverflow exception.what is the problem with my code?
by the way, I'm using VS2013.
std::regex_match will only return true if the entire string matches the pattern. That is, there must not be any characters neither before nor after the expression you want to match. Use std::regex_search for matching a partial string.
Why std::regex_search gives stack overflow is not easy to see from your code excerpt. Most likely the error is a result of the processing you do if you find a match rather than from the library, though. Spin it through the debugger, and you'll quickly see the cause of the stack overflow.
regex is not fully supported in the newer gcc. I used regular expression in terminal and made a new file:
grep -E ";([^,]*,[^,]*,){7,}" file.txt>>res.txt