How to find all sentences except those defined using regular expressions? - c++

The bottom line is that I need to find all the comments in some Python code and cut them out, leaving only the code itself.
But I can't do it from the opposite. That is, I find the comments themselves, but I cannot find everything except them.
I tried using "?!", Made up a regular expression like "(. *) (?! #. *)". But it does not work as I expected.
Just as in the code that I attached, there is an "else" that I tried to use too, that is, write to different variables, but for some reason it doesn't even go there
#include <iostream>
#include <fstream>
#include <string>
#include <regex>
int main()
{
std::string line;
std::string new_line;
std::string result;
std::string result_re;
std::string path;
std::smatch match;
std::regex re("(#.*)");
std::cout << "Enter the path\n";
std::cin >> path;
std::ifstream in(path);
if (in.is_open())
{
while (getline(in, line))
{
if (std::regex_search(line, match, re))
{
for (int i = 0; i < match.size(); i++)
result_re += match[i + 1];
result_re += "\n";
}
else
{
for (int i = 0; i < match.size(); i++)
result += match[i];
//result += "\n";
}
std::cout << line << std::endl;
}
}
in.close();
std::cout << result_re << std::endl;
std::cout << "End of program" << std::endl;
std::cout << result << std::endl;
system("pause");
return 0;
}
As I said above, I want to get everything except comments, and not the other way around.
I also need to do a search for multi-line comments, which are defined in """Text""".
But in this implementation, I can’t even imagine how to do it, since now it is reading line by line, and a multi-line comment in this case with the help of a regulars program is impossible for me to get
I would be grateful for your advices and help.

1. don't try parsing your input file line by line. Instead suck in the whole text and let regex to replace all the comments, this way your entire program would look like this:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <regex>
using namespace std; // for brevity
int main() {
cout << "Enter the path: ";
string filename;
getline(cin, filename);
string pprg{ istream_iterator<char>(ifstream{filename, ifstream::in} >> noskipws),
istream_iterator<char>{} };
pprg = regex_replace(pprg, regex{"#.*"}, "");
cout << pprg << endl;
}
to handle multi-line Python literals """...""", with C++ regex is quite uneasy to do (unlike in the example above): there are few mutually exclusive requirements (imho):
regex should be extended POSIX, but
POSIX regex does not support empty regex matches, however
for crafting an RE to match a negated sequence of characters a negative look-ahead assert is required, which will be an empty match :(
thus it would mean, you'd need to think and put up some programming logic to remove multi-line Python text literals

Related

Arabic regex matching - c++

I need to find the given string has arabic letters. It ranges from \u0600-\u06FF\u0750-\u077F.
I have written the below program:
std::vector<STD_STRING> strFieldvalues;
std::string pattern = "/[\u0600-\u06FF\u0750-\u077F]/";
std:string strFieldVal;
gboolArabic = false;
int i = 0;
int j = 0;
for ( ;i < fieldValues.size() && j< fieldNames.size(); i++,j++) //for loop its entering
{
strFieldVal=fieldValues[i].GetPString();
if (std::regex_match(strFieldVal, std::regex("(sub)(/[\u0600-\u06FF\u0750-\u077F]/)")))
{
gboolArabic = true;
gArabicFieldNames.push_back(fieldNames[j].GetPString());
}
}
strFieldVal is coming as <0067><062A><0627>. But its not entering into the if block.
Can anyone help .
Sample program given below is working in online compiler. In visual studio, not entering into the if block. Adding screenshots.
It appears you need to remove regex delimiters on both ends of the regex and apply a + quantifier to the regex pattern because regex_match requires a full string match:
#include <iostream>
#include <regex>
int main() {
std::string strFieldVal("المتحدة");
std::regex pattern("[\u0600-\u06FF\u0750-\u077F]+");
if (std::regex_match(strFieldVal, pattern))
{
std::cout << strFieldVal << " is Arabic.\n";
}
return 0;
}
See the C++ demo, output: المتحدة is Arabic..
#include <iostream>
#include <regex>
int main() {
std::wstring strFieldVal(L"المتحدة");
std::wregex pattern(L"[\u0600-\u06FF\u0750-\u077F]+");
if (std::regex_match(strFieldVal, pattern))
{
std::cout << strFieldVal << " is Arabic.\n";
}
return 0;
}
The above one works correctly.In Visual Studio When i Add c++ source file and add this content, its asking for encoding,i given Yes. Then it worked perfectly.

Using the already-defined regex pattern in another regex pattern and a question about applying regex to file

How can I use the already-defined regex pattern in another regex pattern. For example in the following code sign and number are defined and I want to use them in defining relation:
regex sign("=<|=|>|<=|<>|>=");
regex number("^[1-9]\\d*");
regex relation(number, sign, number)
So, I need to find all matches (to the pattern like 23<=34 or 123<>2000) in the given file.
Since I haven't completed the relation, I've been testing with sign:
#include <iostream>
#include <fstream>
#include <regex>
using namespace std;
int main() {
regex sign("=<|=|>|<=|<>|>=");
regex digit("[0-9]");
regex number("^[1-9]\\d*");
//regex relation("^[1-9]\d*[=<|=|>|<=|<>|>=]^[1-9]\d*"); (this part is what I couldn't do)
string line;
ifstream fin;
fin.open("name.txt");
if (fin.good()) {
while (getline(fin, line)) {
bool match_sign = regex_search(line, sign);
if (match_sign) {
cout << line << endl; // but I need to print the match only
}
}
}
return 0;
}
When I want to print the matches in the file, it prints the whole line which contains any match. How can I make it print only the match itself but not the whole line?
Update:
#include <iostream>
#include <fstream>
#include <vector>
#include <regex>
using namespace std;
#define REGEX_SIGN "=<|=|>|<=|<>|>="
#define REGEX_DIGIT "[0-9]"
#define REGEX_NUMBER "^" REGEX_DIGIT "\\d*"
int main() {
regex sign(REGEX_SIGN);
regex digit(REGEX_DIGIT);
regex number(REGEX_NUMBER);
regex relation(REGEX_NUMBER REGEX_SIGN REGEX_NUMBER);
string line, text;
ifstream fin;
fin.open("name.txt");
if (fin.good()) {
while (getline(fin, line)) {
text += line + " ";
}
int count = 0;
string word = "";
for (int i = 0; i < text.length(); i++) {
if (text[i] == ' ') {
cout << "word = " << word << " | match: " << regex_match(word, relation) << endl;
if (regex_match(word, relation)) {
cout << word << endl;
}
word = "";
}
else {
word += text[i];
}
}
}
// cout << text << endl;
return 0;
}
Current name.txt looks like this:
But I think the regular expression is not working right:
It says no word matches. Where is the problem?
The problem of "reusing" a smaller regex inside a larger regex is not really possible.
The only workaround I can see is to define the strings of the regexes as macros, and use the compilers literal-string concatenation feature to create larger strings:
#define REGEX_SIGN "=<|=|>|<=|<>|>="
#define REGEX_DIGIT "[0-9]"
#define REGEX_NUMBER "^" REGEX_DIGIT "\\d*"
regex sign(REGEX_SIGN);
regex digit(REGEX_DIGIT);
regex number(REGEX_NUMBER);
regex relation(REGEX_NUMBER REGEX_SIGN REGEX_NUMBER);
This doesn't reuse the actual regex objects, only create longer literal strings from smaller.

Code word search function c++

Below is a simple code to find 2401 in the string. I do not know what that the number is 2401, it can be any number from 0-9. To find the 4 digit number i want to use "DDDD". The letter D will find a number between 0->9. How do i make it so the compiler realizes that a letter D is a a code to find a 1 digit number.
#include <string>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
std::string pattern ;
std::getline(std::cin, pattern);
std::string sentence = "where 2401 is";
//std::getline(std::cin, sentence);
int a = sentence.find(pattern,0);
int b = pattern.length();
cout << sentence.substr(a,b) << endl;
//std::cout << sentence << "\n";
}
try using regular expressions. They can be kind of a pain in the ass, but pretty powerful once mastered. In your case i would recommend using regex_search(), like here:
#include <string>
#include <iostream>
#include <vector>
#include <regex>
using namespace std;
int main()
{
std::smatch m;
std::regex e ("[0-9]{4}"); // matches numbers
std::string sentence = "where 2401 is";
//int a = sentence.find(pattern,0);
//int b = pattern.length();
if (std::regex_search (sentence, m, e))
cout << m.str() << endl;
//cout << sentence.substr(a,b) << endl;
//std::cout << sentence << "\n";
}
If you want to make the exact matching user-specific you can also just ask for the number of digits in the number or the complete regular expression, etc.
Also noted:
the simple regular expression provided [0-9]{4} means: "any character between 0 and 9 excactly 4 times in a sequence". Have a look here for more information
in your question you mentioned, you wanted the compiler to do the matching. Regular expressions are not matched by the compiler, but at runtime. In that case you could also variable the input string and the regular expression.
using namespace std; makes the prefix std:: unnecessary for those variable declarations
std::getline(std::cin, pattern); could be replaced by cin >> pattern;

How to use cin with unknown input types?

I have a C++ program which needs to take user input. The user input will either be two ints (for example: 1 3) or it will be a char (for example: s).
I know I can get the twos ints like this:
cin >> x >> y;
But how do I go about getting the value of the cin if a char is input instead? I know cin.fail() will be called but when I call cin.get(), it does not retrieve the character that was input.
Thanks for the help!
Use std::getline to read the input into a string, then use std::istringstream to parse the values out.
You can do this in c++11. This solution is robust, will ignore spaces.
This is compiled with clang++-libc++ in ubuntu 13.10. Note that gcc doesn't have a full regex implementation yet, but you could use Boost.Regex as an alternative.
EDIT: Added negative numbers handling.
#include <regex>
#include <iostream>
#include <string>
#include <utility>
using namespace std;
int main() {
regex pattern(R"(\s*(-?\d+)\s+(-?\d+)\s*|\s*([[:alpha:]])\s*)");
string input;
smatch match;
char a_char;
pair<int, int> two_ints;
while (getline(cin, input)) {
if (regex_match(input, match, pattern)) {
if (match[3].matched) {
cout << match[3] << endl;
a_char = match[3].str()[0];
}
else {
cout << match[1] << " " << match[2] << endl;
two_ints = {stoi(match[1]), stoi(match[2])};
}
}
}
}

C++ tokenize a string using a regular expression

I'm trying to learn myself some C++ from scratch at the moment.
I'm well-versed in python, perl, javascript but have only encountered C++ briefly, in a
classroom setting in the past. Please excuse the naivete of my question.
I would like to split a string using a regular expression but have not had much luck finding
a clear, definitive, efficient and complete example of how to do this in C++.
In perl this is action is common, and thus can be accomplished in a trivial manner,
/home/me$ cat test.txt
this is aXstringYwith, some problems
and anotherXY line with similar issues
/home/me$ cat test.txt | perl -e'
> while(<>){
> my #toks = split(/[\sXY,]+/);
> print join(" ",#toks)."\n";
> }'
this is a string with some problems
and another line with similar issues
I'd like to know how best to accomplish the equivalent in C++.
EDIT:
I think I found what I was looking for in the boost library, as mentioned below.
boost regex-token-iterator (why don't underscores work?)
I guess I didn't know what to search for.
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main(int argc)
{
string s;
do{
if(argc == 1)
{
cout << "Enter text to split (or \"quit\" to exit): ";
getline(cin, s);
if(s == "quit") break;
}
else
s = "This is a string of tokens";
boost::regex re("\\s+");
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j;
unsigned count = 0;
while(i != j)
{
cout << *i++ << endl;
count++;
}
cout << "There were " << count << " tokens found." << endl;
}while(argc == 1);
return 0;
}
The boost libraries are usually a good choice, in this case Boost.Regex. There even is an example for splitting a string into tokens that already does what you want. Basically it comes down to something like this:
boost::regex re("[\\sXY]+");
std::string s;
while (std::getline(std::cin, s)) {
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j;
while (i != j) {
std::cout << *i++ << " ";
}
std::cout << std::endl;
}
If you want to minimize use of iterators, and pithify your code, the following should work:
#include <string>
#include <iostream>
#include <boost/regex.hpp>
int main()
{
const boost::regex re("[\\sXY,]+");
for (std::string s; std::getline(std::cin, s); )
{
std::cout << regex_replace(s, re, " ") << std::endl;
}
}
Unlike in Perl, regular expressions are not "built in" into C++.
You need to use an external library, such as PCRE.
Regex are part of TR1 included in Visual C++ 2008 SP1 (including express edition) and G++ 4.3.
Header is <regex> and namespace std::tr1. Works great with STL.
Getting started with C++ TR1 regular expressions
Visual C++ Standard Library : TR1 Regular Expressions