Why regex_match() and regex_search() do not work? - c++

my data file (m;w,t,w,t,w,t......,w,t) is like :
5762;895,360851.301667
5763;895,360851.301667
83495;166,360817.861111
175040156;7597,360815.840556,6905,363521.083889,774,363647.044722,20787,364348.666667,3158,364434.308333,3702,364480.726944,8965,365022.092778,1071,365043.283333,82,365544.150000,9170,365607.336667,46909,365635.057778,2165,365754.650000,895,366683.907500,121212,366689.450000,10571,366967.131944,1499,367707.580833,1790,368741.724167,7715,369115.480000
.........
and I want to find lines with (w,t) pairs occured >=7 times. I used this code:
ofstream MyTxtFile;
ifstream file("ipod-cascades.txt");
MyTxtFile.open("ipod-res.txt");
bool isWebId = true;
int n = 7,count=0;
string line;
string value;
smatch m;
while (getline(file, line)){
if (std::regex_search(line,m, std::regex(";([^,]*,[^,]*,){7,}"))){
count++;
std::stringstream linestream(line);
std::string tmp;
if (getline(linestream, value, ';')){
while (getline(linestream, tmp, ',')){
if (isWebId){
MyTxtFile << value << "," << tmp;
isWebId = false;
}
else{
MyTxtFile << "," << tmp << endl;
isWebId = true;
}
}
}
}
}
when I use 'regex_match()' it does not find any line, and when I use 'regex_search()' it finds some lines and then gives stackoverflow exception.what is the problem with my code?
by the way, I'm using VS2013.

std::regex_match will only return true if the entire string matches the pattern. That is, there must not be any characters neither before nor after the expression you want to match. Use std::regex_search for matching a partial string.
Why std::regex_search gives stack overflow is not easy to see from your code excerpt. Most likely the error is a result of the processing you do if you find a match rather than from the library, though. Spin it through the debugger, and you'll quickly see the cause of the stack overflow.

regex is not fully supported in the newer gcc. I used regular expression in terminal and made a new file:
grep -E ";([^,]*,[^,]*,){7,}" file.txt>>res.txt

Related

How do I parse a line into pieces and ignore parts of it?

I am sorry. I wasn't clair previously. I have a file that include data in the following format
A(3)
B(4),A
C(2),A
E(5),A
G(3),A
J(8),B,H
H(7),C,E,G
I(6),G
F(5),H
...
These data represent a graph.
I will use the critical path method to calculate how to get through this text file.
the char is the step
the int is the length of each task
the other char is step that come before the first char
So I have created the class Task to read the file and its constructor have the following parameters
Tache::Tache(char step2, int duration, list<Task*> precedentTask)
{
this->step = step2;
this -> duration = duration;
for(list<Task*>::iterator it = this-> precedentTask.begin(); it != this-> precedentTask.end(); it++)
{
this-> precedentTask.push_back(*it);
}
}
In the main I added
string line;
list<Task> *allTaches = new list<Task>();
while(getline(file, line, ','))
{
//I want to be able to receive the parse line from the file and add it like
//allTaches.push_back(line)
//But the format needs to look like (Char, duration, <a list of> PrecedentChar)
//when I do
cout<< line << Lendl;
it prints
A(3)
B(4)
A
C(2)
A
E(5)
A
}
So I am not sure to know what to do really.
You can use a regular expression to parse out the pieces you need and then pass them to Task
In c++ that is done using std::regex
The code below will help you understand how to parse out the pieces, applying them to test is a simple step from there, but best done by you to make sure the concept is clear.
First we will need a regular expression that grabs each piece, this is called a capture group and all that is needed is to use parenthesis
If we break down what you have - it is:
Something, an open paren we dont want, Something, a close paren we dont want, a comma we don't want, and Something
in simple regex that would be:
(.*)\((.*)\),(.*)
But things are never so simple
The first Something ends with the open paren, so we want everything but that first open paren: ([^(]) the ^ means not, the square bracket [] means every character
The second Something ends with the close paren, so we have ([^)])
The third something excludes the optional comma, but we can use (.*) and then group the , in an optional * (There is likely a better way to do this)
We also need to double escape the \ once for the compiler and once for regex
We also need to allow for people entering random spaces in there so we add * in all breaks
This leads to our regex:
*([^(]*) *\\( *([^)]*) *\\) *(, *(.*))*
Then we search and if found it will be in the result and we can iterate it to get the pieces.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// std::string seq = "A(4),B";
std::string seq = "A(4)";
try {
std::regex rgx(" *([^(]*) *\\( *([^)]*) *\\) *(, *(.*))*");
std::smatch result;
if(std::regex_search(seq, result, rgx))
{
std::cout << "Size=" << result.size() << std::endl;
for(size_t i=0; i<result.size(); ++i)
{
std::cout << result[i] << std::endl;
}
}
else
{
std::cout << "NO MATCH" << std::endl;
}
} catch (std::regex_error& e) {
std::cout << "BAD REGEX" << std::endl;
}
}
What you're actually looking to do here is create an extraction operator for your Tache object. I'm going to assume that your code looks something like this:
typedef char Task;
struct Tache {
char step;
int duration;
list<Task> precedentTask;
};
Your extraction operator will be a method of Tache. It's brute force implementation will look something like this:
istream& operator>>(istream& lhs, Tache& rhs) {
string line;
getline(lhs, line, '\n');
stringstream ss(line);
ss >> rhs.step;
ss.ignore(numeric_limits<streamsize>::max(), '(');
ss >> rhs.duration;
ss.ignore(numeric_limits<streamsize>::max(), ')');
const regex re("\\s*,\\s*([a-zA-Z])");
string precedentTasks;
getline(ss, precedentTasks);
rhs.precedentTask.clear();
transform(sregex_token_iterator(cbegin(precedentTasks), cend(precedentTasks), re, 1), sregex_token_iterator(), back_insert_iterator<list<Task>>(rhs.precedentTask), [](const string& i) {
return i.front();
});
return lhs;
}
Live Example

getting error using regex in C++

Given input has 4 lines and I am supposed to find how many lines have word hacker
4
I love #hacker
I just scored 27 points in the Picking Cards challenge on #Hacker
I just signed up for summer cup #hacker
interesting talk by hari, co-founder of hacker
The answer is 4 but I get it as 0.
int main() {
int count = 0,t;
cin >> t;
string s;
bool ans;
while(t--){
cin >> s;
smatch sm;
regex rgx("hacker",regex_constants::icase);
ans = regex_match(s,sm,rgx);
if(ans){
count += 1;
}
}
cout << ans << endl;
return 0;
}
Your while loop only runs t times, and every time it only reads one word. So your program right now will only read the first three words and then terminate.
You're only matching the whole word. In the case of #hacker an #hacker, there will be no match.
I believe you want to cout count instead of ans at the end.
You should use std::getline instead to read a string (containing whitespaces).
Also, you should use std::regex_search to search for a 'partial' match (std::regex_match will only match when the regex matches the whole string).
Here's your code a little modified:
#include <regex>
#include <iostream>
#include <string>
int main() {
int count = 0,t;
std::cin >> t;
std::string s;
std::smatch sm;
std::regex rgx("hacker", std::regex_constants::icase);
for(int i = 0; i < t; ++i)
{
std::getline(std::cin, s);
while(std::regex_search(s, sm, rgx))
{
++count;
s = sm.suffix().str();
}
}
std::cout << count << std::endl;
return 0;
}
If you change your regex as follows, you will get expected result:
regex rgx("(.*)hacker(.*)",regex_constants::icase);
So it is basically comparing for the match in whole string.
Otherwise you have to use std::regex_search in place of std::regex_match
ans = regex_search(s,sm,rgx);
Demo: http://coliru.stacked-crooked.com/a/f28c2e4b315f6f0a
It looks like the first word is supposed to be the number of lines of input. But, even though it seems you want to process four lines of input, the input says 3.Question has since been edited.
You are not reading lines, but strings, which translates into individual words. Use getline() to get a line of input.
while(t--){
std::getline(std::cin, s);
//...
Your regular expression is ill-formed. It will only match if the line consists only of the word "hacker". You want to see if hacker is in the line, so make allow your pattern to match the rest of the line around the word "hacker".
regex rgx(".*hacker.*",regex_constants::icase);
When you emit your answer, it seems you want to emit count, not ans.
DEMO

set boost regex from foreign source

I need to parse log and I`ve good working regex, but now I need to set regex from config file and here is problem.
int logParser()
{
std::string bd_regex; // this reads from config in other part of program
boost::regex parsReg;
//("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");
try
{
parsReg.assign(bd_regex, boost::regex_constants::icase);
}
catch (boost::regex_error& e)
{
cout << bd_regex << " is not a valid regular expression: \""
<< e.what() << "\"" << endl;
}
cout << parsReg << endl;
// here it looks exactly like:
// "("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");"
int count=0;
ifstream in;
in.open(bd_log_path.c_str());
while (!in.eof())
{
in.getline(buf, BUFSIZE-1);
std::string s = buf;
boost::smatch m;
if (boost::regex_search(s, m, parsReg)) // it doesn't obey this "if"
{
std::string name, diagnosis;
name.assign(m[2]);
diagnosis.assign(m[4]);
strcpy(bd_scan_results[count].file_name, name.c_str());
strcpy(bd_scan_results[count].out, diagnosis.c_str());
strcat(bd_scan_results[count].out, " ");
count++;
}
}
return count;
}
and I really dont know why the same regex dont work when I tryed to set it from config variable.
Any help will be appreciated (:
On your direct question: Try storing the regex without escapes in the config file
(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])
Besides, I must say, that it looks like you wanted to match backslashes here:
C:.tmp.bd.
In the config, write:
C:\\tmp\\bd\\
In a C++ string literal that would be
"C:\\\\tmp\\\\bd\\\\"
#sehe gives the correct answer.
If this line of code were parsed by the c++ parser,
str = "(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])";
it would unescape the escape character \\ into just an escape: \, then
asign it to variable 'str'. Inside of the variable 'str', it now looks like this:
(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])
But, you are reading this text from a file, there is no parsing in a language sense.
You are asigning to 'str', a raw line of text. A line that is not pre-processed by the c++ parser.

Extracting submatches using boost regex in c++

I'm trying to extract submatches from a text file using boost regex. Currently I'm only returning the first valid line and the full line instead of the valid email address. I tried using the iterator and using submatches but I wasn't having success with it. Here is the current code:
if(Myfile.is_open()) {
boost::regex pattern("^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$");
while(getline(Myfile, line)) {
string::const_iterator start = line.begin();
string::const_iterator end = line.end();
boost::sregex_token_iterator i(start, end, pattern);
boost::sregex_token_iterator j;
while ( i != j) {
cout << *i++ << endl;
}
Myfile.close();
}
Use boost::smatch.
boost::regex pattern("what(ever) ...");
boost::smatch result;
if (boost::regex_search(s, result, pattern)) {
string submatch(result[1].first, result[1].second);
// Do whatever ...
}
const string pattern = "(abc)(def)";
const string target = "abcdef";
boost::regex regexPattern(pattern, boost::regex::extended);
boost::smatch what;
bool isMatchFound = boost::regex_match(target, what, regexPattern);
if (isMatchFound)
{
for (unsigned int i=0; i < what.size(); i++)
{
cout << "WHAT " << i << " " << what[i] << endl;
}
}
The output is the following
WHAT 0 abcdef
WHAT 1 abc
WHAT 2 def
Boost uses parenthesized submatches, and the first submatch is always the full matched string. regex_match has to match the entire line of input against the pattern, if you are trying to match a substring, use regex_search instead.
The example I used above uses the posix extended regex syntax, which is specified using the boost::regex::extended parameter. Omitting that parameter changes the syntax to use perl style regex syntax. Other regex syntax is available.
This line:
string submatch(result[1].first, result[1].second);
causes errors in visual c++ (I tested against 2012, but expect earlier version do, too)
See https://groups.google.com/forum/?fromgroups#!topic/cpp-netlib/0Szv2WcgAtc for analysis.
The most simplest way to convert boost::sub_match to std::string :
boost::smatch result;
// regex_search or regex_match ...
string s = result[1];

I can't add a new line to c++ string

How do you add a new line to a c++ string? I'm trying to read a file but when I try to append '\n' it doesn't work.
std::string m_strFileData;
while( DataBegin != DataEnd ) {
m_strFileData += *DataBegin;
m_strFileData += '\n';
DataBegin++;
}
If you have a lot of lines to process, using stringstream could be more efficient.
ostringstream lines;
lines << "Line 1" << endl;
lines << "Line 2" << endl;
cout << lines.str(); // .str() is a string
Output:
Line 1
Line 2
Sorry about the late answer, but I had a similar problem until I realised that the Visual Studio 2010 char* visualiser ignores \r and \n characters. They are completely ommitted from it.
Note: By visualiser I mean what you see when you hover over a char* (or string).
Just a guess, but perhaps you should change the character to a string:
m_strFileData += '\n';
to be this:
m_strFileData += "\n";
This would append a newline after each character, or string depending on what type DataBegin actually is. Your problem does not lie in you given code example. It would be more useful if you give your expected and actual results, and the datatypes of the variables use.
Try this:
ifstream inFile;
inFile.open(filename);
std::string entireString = "";
std::string line;
while (getline(inFile,line))
{
entireString.append(line);
entireString.append("\n");
}