regex_match doesn't find any matching - c++

I tested the RegEx [BCGRYWbcgryw]{4}\[\d\] on that site and it seems ok to find match in the following BBCC[0].GGRY[0].WWWW[soln]. It match with BBCC[0] and GGRY[0].
But when I tried to code and debug that matching, the smvalue stay empty.
regex r("[BCGRYWbcgryw]{4}\\[\\d\\]");
string line; in >> line;
smatch sm;
regex_match(line, sm, r, regex_constants::match_any);
copy(boost::begin(sm), boost::end(sm), ostream_iterator<smatch::value_type>(cout, ", "));
Where am I wrong ?

If you don't want to match the whole input sequence then use std::regex_search not std::regex_match
#include <iostream>
#include <regex>
#include <iterator>
#include <algorithm>
int main()
{
using namespace std;
regex r(R"([BCGRYWbcgryw]{4}\[\d\])");
string line = "BBCC[0].GGRY[0].WWWW[soln]";
smatch sm;
regex_search(line, sm, r, regex_constants::match_any);
copy(std::begin(sm), std::end(sm), ostream_iterator<smatch::value_type>(cout, ", "));
cout << endl;
}
N.B. this also uses raw strings to simplify the regular expression.

I finally get it work, with () to define a capture group and I use regex_iterator to find all sub-string matching the pattern.
std::regex rstd("(\\[[0-9]\\]\.[BCGRYWbcgryw]{4})");
std::sregex_iterator stIterstd(line.begin(), line.end(), rstd);
std::sregex_iterator endIterstd;
for (stIterstd; stIterstd != endIterstd; ++stIterstd)
{
cout << " Whole string " << (*stIterstd)[0] << endl;
cout << " First sub-group " << (*stIterstd)[1] << endl;
}
The output is:
Whole string [0].GGRY
First sub-group [0].GGRY
Whole string [0].WWWW
First sub-group [0].WWWW

Related

Regex parse a list from a string starting with a specific character

Is there a way to parse this lines?
line #1
f 1/2/3
line #2
f 1/2/3 4/5/6 7/8/9 10/11/12
I need a check if the string is starting with a letter f and then patterns (\d+)\/(\d+)\/(\d+) repeat delimited with a space
I want to get matches 1/2/3 for the line #1
I want to get matches 1/2/3 4/5/6 7/8/9 10/11/12 for the line #2
Yes you can parse it with regex in c++ like this.
it is not completely done like normal regex in the
sense that you need to force c++ to find multiple
groups from a while loop.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
struct numbers_t
{
numbers_t(int n1, int n2, int n3) : v1{ n1 }, v2{ n2 }, v3{ n3 } {}
int v1;
int v2;
int v3;
};
auto parse_number_groups(const std::string& input)
{
static std::regex line_starts_with_f_and_space{ "^f " };
static std::regex number_group("(\\d+)/(\\d+)/(\\d+)");
static std::smatch match;
std::vector<numbers_t> output;
if (std::regex_search(input, line_starts_with_f_and_space))
{
std::string::const_iterator cbegin(input.cbegin());
while (std::regex_search(cbegin, input.cend(), match, number_group))
{
// match [0] is the whole match, match[1] to match[3] are the subgroups. (check in debugger)
output.emplace_back(std::stoi(match[1]), std::stoi(match[2]), std::stoi(match[3]));
cbegin = match.suffix().first;
}
}
return output;
}
int main()
{
auto number_groups = parse_number_groups("f 1/2/3 4/5/6 7/8/9 10/11/12");
for (const auto& number_group : number_groups)
{
std::cout << number_group.v1 << ", " << number_group.v2 << ", " << number_group.v3 << std::endl;
}
return 0;
}
As f 1/2/3 xxx should result in no matches, there are two operations you need to perform:
Validate the string against the pattern that validates your string format
Extract space-separated tokens.
The validation part can be achieved with a regex like
^f\s+(\d+/\d+/\d+(?:\s+\d+/\d+/\d+)*)$
See the regex demo. NOTE: a shorter variant would be ideal, ^f((?:\s+\d+/\d+/\d+)+)$, but it captures into Group 1 the space after f, while it is more convenient here to trim the Group 1 value with the regex itself. Details:
^ - start of string (NOTE: it is not necessary when using the regex with std::regex_match as this method anchors the match at the start and end of string by default)
f - an f char
\s+ - one or more whitespaces
(\d+/\d+/\d+(?:\s+\d+/\d+/\d+)*) - Group 1: one or more digits and then two occurrences of / and one or more digits, and then zero or more occurrences of one or more whitespaces, one or more digits and then two occurrences of / and one or more digits
$ - end of string (NOTE: not necessary when using the regex with std::regex_match again).
Next step is to get the Group 1 value and split it with whitespaces. Yes, you may use a regex iterator to extract the \d+/\d+/\d+ matches, but you may in fact use the fact the string will be just a list of space-separated tokens and avoid using the second regex.
Here is the full C++ code demo:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <sstream>
int main()
{
const std::regex valRegex(R"(f\s+(\d+/\d+/\d+(?:\s+\d+/\d+/\d+)*))");
// const std::string foo = "f 1/2/3 4/5/6 7/8/9 10/11/12 xxx"; // => The 'f 1/2/3 4/5/6 7/8/9 10/11/12 xxx' string is not valid!
const std::string foo = "f 1/2/3 4/5/6 7/8/9 10/11/12"; // => ["1/2/3", "4/5/6", "7/8/9", "10/11/12"]
std::smatch m;
if (std::regex_match(foo, m, valRegex)) { // Run full string match
std::istringstream iss(m[1].str()); // If matched, split
std::vector<std::string> tokens; // Group 1 with whitespaces
copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(tokens));
for(auto &s: tokens) { // Demo token output
std::cout << s << std::endl;
}
} else {
std::cout << "The '" << foo << "' string is not valid!" << std::endl;
}
}

Using the already-defined regex pattern in another regex pattern and a question about applying regex to file

How can I use the already-defined regex pattern in another regex pattern. For example in the following code sign and number are defined and I want to use them in defining relation:
regex sign("=<|=|>|<=|<>|>=");
regex number("^[1-9]\\d*");
regex relation(number, sign, number)
So, I need to find all matches (to the pattern like 23<=34 or 123<>2000) in the given file.
Since I haven't completed the relation, I've been testing with sign:
#include <iostream>
#include <fstream>
#include <regex>
using namespace std;
int main() {
regex sign("=<|=|>|<=|<>|>=");
regex digit("[0-9]");
regex number("^[1-9]\\d*");
//regex relation("^[1-9]\d*[=<|=|>|<=|<>|>=]^[1-9]\d*"); (this part is what I couldn't do)
string line;
ifstream fin;
fin.open("name.txt");
if (fin.good()) {
while (getline(fin, line)) {
bool match_sign = regex_search(line, sign);
if (match_sign) {
cout << line << endl; // but I need to print the match only
}
}
}
return 0;
}
When I want to print the matches in the file, it prints the whole line which contains any match. How can I make it print only the match itself but not the whole line?
Update:
#include <iostream>
#include <fstream>
#include <vector>
#include <regex>
using namespace std;
#define REGEX_SIGN "=<|=|>|<=|<>|>="
#define REGEX_DIGIT "[0-9]"
#define REGEX_NUMBER "^" REGEX_DIGIT "\\d*"
int main() {
regex sign(REGEX_SIGN);
regex digit(REGEX_DIGIT);
regex number(REGEX_NUMBER);
regex relation(REGEX_NUMBER REGEX_SIGN REGEX_NUMBER);
string line, text;
ifstream fin;
fin.open("name.txt");
if (fin.good()) {
while (getline(fin, line)) {
text += line + " ";
}
int count = 0;
string word = "";
for (int i = 0; i < text.length(); i++) {
if (text[i] == ' ') {
cout << "word = " << word << " | match: " << regex_match(word, relation) << endl;
if (regex_match(word, relation)) {
cout << word << endl;
}
word = "";
}
else {
word += text[i];
}
}
}
// cout << text << endl;
return 0;
}
Current name.txt looks like this:
But I think the regular expression is not working right:
It says no word matches. Where is the problem?
The problem of "reusing" a smaller regex inside a larger regex is not really possible.
The only workaround I can see is to define the strings of the regexes as macros, and use the compilers literal-string concatenation feature to create larger strings:
#define REGEX_SIGN "=<|=|>|<=|<>|>="
#define REGEX_DIGIT "[0-9]"
#define REGEX_NUMBER "^" REGEX_DIGIT "\\d*"
regex sign(REGEX_SIGN);
regex digit(REGEX_DIGIT);
regex number(REGEX_NUMBER);
regex relation(REGEX_NUMBER REGEX_SIGN REGEX_NUMBER);
This doesn't reuse the actual regex objects, only create longer literal strings from smaller.

How to find all sentences except those defined using regular expressions?

The bottom line is that I need to find all the comments in some Python code and cut them out, leaving only the code itself.
But I can't do it from the opposite. That is, I find the comments themselves, but I cannot find everything except them.
I tried using "?!", Made up a regular expression like "(. *) (?! #. *)". But it does not work as I expected.
Just as in the code that I attached, there is an "else" that I tried to use too, that is, write to different variables, but for some reason it doesn't even go there
#include <iostream>
#include <fstream>
#include <string>
#include <regex>
int main()
{
std::string line;
std::string new_line;
std::string result;
std::string result_re;
std::string path;
std::smatch match;
std::regex re("(#.*)");
std::cout << "Enter the path\n";
std::cin >> path;
std::ifstream in(path);
if (in.is_open())
{
while (getline(in, line))
{
if (std::regex_search(line, match, re))
{
for (int i = 0; i < match.size(); i++)
result_re += match[i + 1];
result_re += "\n";
}
else
{
for (int i = 0; i < match.size(); i++)
result += match[i];
//result += "\n";
}
std::cout << line << std::endl;
}
}
in.close();
std::cout << result_re << std::endl;
std::cout << "End of program" << std::endl;
std::cout << result << std::endl;
system("pause");
return 0;
}
As I said above, I want to get everything except comments, and not the other way around.
I also need to do a search for multi-line comments, which are defined in """Text""".
But in this implementation, I can’t even imagine how to do it, since now it is reading line by line, and a multi-line comment in this case with the help of a regulars program is impossible for me to get
I would be grateful for your advices and help.
1. don't try parsing your input file line by line. Instead suck in the whole text and let regex to replace all the comments, this way your entire program would look like this:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <regex>
using namespace std; // for brevity
int main() {
cout << "Enter the path: ";
string filename;
getline(cin, filename);
string pprg{ istream_iterator<char>(ifstream{filename, ifstream::in} >> noskipws),
istream_iterator<char>{} };
pprg = regex_replace(pprg, regex{"#.*"}, "");
cout << pprg << endl;
}
to handle multi-line Python literals """...""", with C++ regex is quite uneasy to do (unlike in the example above): there are few mutually exclusive requirements (imho):
regex should be extended POSIX, but
POSIX regex does not support empty regex matches, however
for crafting an RE to match a negated sequence of characters a negative look-ahead assert is required, which will be an empty match :(
thus it would mean, you'd need to think and put up some programming logic to remove multi-line Python text literals

c++11 regex - finding all matches between two slash chars

Lets assume I have a string like: String1/String2/String3/String4
I'd like to use regex to find every matching between slash characters + everything after the last / character. so the output would be: String2 , String3 , String4
smatch match_str;
regex re_str("\\/(.*)");
regex_match( s, match_str, re_str );
cout << match_str[1] << endl;
cout << match_str[2] << endl;
cout << match_str[3] << endl;
Note that regex_match requires a full string match. Also, .* matches 0 or more characters other than a newline, as many as possible (that is, it matches until the very end of the given line).
Also, / symbol in a C++ regex does not need to be escaped.
Here is a working code:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex r("[^/]+");
std::smatch m;
std::string s = "String1/String2/String3/String4";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i )
{
std::smatch m = *i;
std::cout << m[0] << '\n';
}
return 0;
}
See IDEONE demo
Results:
String1
String2
String3
String4
If you need to specify the initial boundary, use
std::regex rex1("(?:^|/)([^/]+)");
The values will be inside m[1] then, rather than in m[0]. See another demo.
You can use this one:
\\/([^\\/])*
With a live example
Here is a way to do it using string iterators (untested).
std::string strInput = "String1/String2/String3/String4";
std::string::const_iterator start = strInput.begin();
std::string::const_iterator end = strInput.end();
std::smatch _M;
std::regex Rx( "/([^/]*)" );
while ( regex_search( start, end, _M, Rx ) )
{
std::string strSubDir = _M[1].str(); // Do something with subdir
cout << strSubDir.c_str() << endl; // Debug print subdir
start = _M[0].second; // Advance the start iterator past this match
}
Output:
String2
String3
String4

How to remove first word from a string?

Let's say I have
string sentence{"Hello how are you."}
And I want string sentence to have "how are you" without the "Hello". How would I go about doing that.
I tried doing something like:
stringstream ss(sentence);
ss>> string junkWord;//to get rid of first word
But when I did:
cout<<sentence;//still prints out "Hello how are you"
It's pretty obvious that the stringstream doesn't change the actual string. I also tried using strtok but it doesn't work well with string.
Try the following
#include <iostream>
#include <string>
int main()
{
std::string sentence{"Hello how are you."};
std::string::size_type n = 0;
n = sentence.find_first_not_of( " \t", n );
n = sentence.find_first_of( " \t", n );
sentence.erase( 0, sentence.find_first_not_of( " \t", n ) );
std::cout << '\"' << sentence << "\"\n";
return 0;
}
The output is
"how are you."
str=str.substr(str.find_first_of(" \t")+1);
Tested:
string sentence="Hello how are you.";
cout<<"Before:"<<sentence<<endl;
sentence=sentence.substr(sentence.find_first_of(" \t")+1);
cout<<"After:"<<sentence<<endl;
Execution:
> ./a.out
Before:Hello how are you.
After:how are you.
Assumption is the line does not start with an empty space. In such a case this does not work.
find_first_of("<list of characters>").
the list of characters in our case is space and a tab. This will search for first occurance of any of the list of characters and return an iterator. After that adding +1 movers the position by one character.Then the position points to the second word of the line.
Substr(pos) will fetch the substring starting from position till the last character of the string.
You can for example take the remaining substring
string sentence{"Hello how are you."};
stringstream ss{sentence};
string junkWord;
ss >> junkWord;
cout<<sentence.substr(junkWord.length()+1); //string::substr
However, it also depends what you want to do further
There are countless ways to do this. I think I would go with this:
#include <iostream>
#include <string>
int main() {
std::string sentence{"Hello how are you."};
// First, find the index for the first space:
auto first_space = sentence.find(' ');
// The part of the string we want to keep
// starts at the index after the space:
auto second_word = first_space + 1;
// If you want to write it out directly, write the part of the string
// that starts at the second word and lasts until the end of the string:
std::cout.write(
sentence.data() + second_word, sentence.length() - second_word);
std::cout << std::endl;
// Or, if you want a string object, make a copy from the start of the
// second word. substr copies until the end of the string when you give
// it only one argument, like here:
std::string rest{sentence.substr(second_word)};
std::cout << rest << std::endl;
}
Of course, unless you have a really good reason not to, you should check that first_space != std::string::npos, which would mean the space was not found. The check is omitted in my sample code for clarity :)
You could use string::find() to locate the first space. Once you have its index, then get the sub string with string::substr() from the index after the index of the space up to the end of the string.
One liner:
std::string subStr = sentence.substr(sentence.find_first_not_of(" \t\r\n", sentence.find_first_of(" \t\r\n", sentence.find_first_not_of(" \t\r\n"))));
working example:
#include <iostream>
#include <string>
void main()
{
std::string sentence{ "Hello how are you." };
char whiteSpaces[] = " \t\r\n";
std::string subStr = sentence.substr(sentence.find_first_not_of(whiteSpaces, sentence.find_first_of(whiteSpaces, sentence.find_first_not_of(whiteSpaces))));
std::cout << subStr;
std::cin.ignore();
}
Here's how to use a stringstream to extract the junkword while ignoring any space before or after (using std::ws), then get the rest of the sentence, with robust error handling....
std::string sentence{"Hello how are you."};
std::stringstream ss{sentence};
std::string junkWord;
if (ss >> junkWord >> std::ws && std::getline(ss, sentence, '\0'))
std::cout << sentence << '\n';
else
std::cerr << "the sentence didn't contain ANY words at all\n";
See it running on ideone here....
#include <iostream> // cout
#include <string> // string
#include <sstream> // string stream
using namespace std;
int main()
{
string testString = "Hello how are you.";
istringstream iss(testString); // note istringstream NOT sstringstream
char c; // this will read the delima (space in this case)
string firstWord;
iss>>firstWord>>c; // read the first word and end after the first ' '
cout << "The first word in \"" << testString << "\" is \"" << firstWord << "\""<<endl;
cout << "The rest of the words is \"" <<testString.substr(firstWord.length()+1) << "\""<<endl;
return 0;
}
output
The first word in "Hello how are you." is "Hello"
The rest of the words is "how are you."
live testing at ideon