C++ What is wrong with this Vector Splitting in While Loop - c++

This is my partial code:
if(action=="auth")
{
myfile.open("account.txt");
while(!myfile.eof())
{
getline(myfile,sline);
vector<string> y = split(sline, ':');
logincheck = "";
logincheck = y[0] + ":" + y[3];
if (sline==actionvalue)
{
sendClient = "login done#Successfully Login.";
break;
}
else
{
sendClient = "fail login#Invalid username/password.";
}
y.clear();
}
myfile.close();
}
If i don't have this
logincheck = y[0] + ":" + y[3];
The code will not have any segmentation core dump error, but when I add that line, it will went totally wrong.
My account.txt is as followed:
admin:PeterSmite:hr:password
cktang:TangCK:normal:password
The split function:
std::vector<std::string> split(std::string const& str, std::string const& delimiters = "#") {
std::vector<std::string> tokens;
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
return tokens;
}
std::vector<std::string> split(std::string const& str, char const delimiter) {
return split(str,std::string(1,delimiter));
}

You should do some basic input checking before you blithely assume that the vector contains at least 4 elements, otherwise y[3] will explode when you parse a line of input without three colons:
if (y.size >= 4) {
// Do login check
} else {
// Invalid input
}
I'd guess that you probably have a blank line in your input.
Wrap the whole section of code that relies on reading a "a:b:c:d" line of input:
if(action=="auth") {
myfile.open("account.txt");
while(getline(myfile,sline))
{
vector<string> y = split(sline, ':');
if (y.size >= 4) {
logincheck = "";
logincheck = y[0] + ":" + y[3];
if (sline==actionvalue) {
sendClient = "login done#Successfully Login.";
break;
} else {
sendClient = "fail login#Invalid username/password.";
}
}
}
myfile.close();
}

The problem is the structure of your loop:
while(!myfile.eof())
{
getline(myfile,sline);
istream::eof() isn't guaranteed to return true until you attempt to read past the end of the stream. So what happens is you read 2 lines and eof() still hasn't return true. Then you enter the loop for the 3rd time. Since you don't check for errors after getline call, you happily access sline when its content is unspecified - it could be empty, it could still carry the content from the previous iteration, it could contain something else.
You always need to check if getline() call is succesful before you attempt to access the string. The idiomatic way is to put it in the condition of the loop:
while (getline(myfile, sline)) { /* do your stuff */ }
This way you only enter the loop body if the read is successful.

The problem is that the call to getline that pulls the last usable line isn't setting EOF, so you do one extra loop iteration after you have gotten the last usable line. That loop operation is running on an empty sline, which causes bad things to happen, namely, split doesn't return a vector with four elements, but then you try to access those elements.
You can just use
while (getline(myfile,sline))
{
// do stuff
}
in place of
while(!myfile.eof())
{
getline(myfile,sline);
// do stuff
}

Related

How do I parse comma-delimited string in C++ with some elements being quoted with commas?

I have a comma-delimited string that I want to store in a string vector. The string and vectors are:
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v;
Ideally I want the strings 'abc' and 'test, 1' to be stored without the single quotes as below, but I can live with storing them with single quotes:
v[0] = "1";
v[1] = "10";
v[2] = "abc";
v[3] = "test, 1";
bool nextToken(const string &s, string::size_type &start, string &token)
{
token.clear();
start = s.find_first_not_of(" \t", start);
if (start == string::npos)
return false;
string::size_type end;
if (s[start] == '\'')
{
++start;
end = s.find('\'', start);
}
else
end = s.find_first_of(" \t,", start);
if (end == string::npos)
{
token = s.substr(start);
start = s.size();
}
else
{
token = s.substr(start, end-start);
if ((s[end] != ',') && ((end = s.find(',', end + 1)) == string::npos))
start = s.size();
else
start = end + 1;
}
return true;
}
string s = "1, 10, 'abc', 'test, 1'", token;
vector<string> v;
string::size_type start = 0;
while (nextToken(s, start, token))
v.push_back(token);
Demo
What you need to do here, is make yourself a parser that parses as you want it to. Here I have made a parsing function for you:
#include <string>
#include <vector>
using namespace std;
vector<string> parse_string(string master) {
char temp; //the current character
bool encountered = false; //for checking if there is a single quote
string curr_parse; //the current string
vector<string>result; //the return vector
for (int i = 0; i < master.size(); ++i) { //while still in the string
temp = master[i]; //current character
switch (temp) { //switch depending on the character
case '\'': //if the character is a single quote
if (encountered) encountered = false; //if we already found a single quote, reset encountered
else encountered = true; //if we haven't found a single quote, set encountered to true
[[fallthrough]];
case ',': //if it is a comma
if (!encountered) { //if we have not found a single quote
result.push_back(curr_parse); //put our current string into our vector
curr_parse = ""; //reset the current string
break; //go to next character
}//if we did find a single quote, go to the default, and push_back the comma
[[fallthrough]];
default: //if it is a normal character
if (encountered && isspace(temp)) curr_parse.push_back(temp); //if we have found a single quote put the whitespace, we don't care
else if (isspace(temp)) break; //if we haven't found a single quote, trash the whitespace and go to the next character
else if (temp == '\'') break; //if the current character is a single quote, trash it and go to the next character.
else curr_parse.push_back(temp); //if all of the above failed, put the character into the current string
break; //go to the next character
}
}
for (int i = 0; i < result.size(); ++i) {
if (result[i] == "") result.erase(result.begin() + i);
//check that there are no empty strings in the vector
//if there are, delete them
}
return result;
}
This parses your string as you want it to, and returns a vector. Then, you can use it in your program:
#include <iostream>
int main() {
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v = parse_string(s);
for (int i = 0; i < v.size(); ++i) {
cout << v[i] << endl;
}
}
and it properly prints out:
1
10
abc
test, 1
A proper solution would require a parser implementation. If you need a quick hack, just write a cell reading function (demo). The c++14's std::quoted manipulator is of great help here. The only problem is the manipulator requires a stream. This is easily solved with istringstream - see the second function. Note that the format of your string is CELL COMMA CELL COMMA... CELL.
istream& get_cell(istream& is, string& s)
{
char c;
is >> c; // skips ws
is.unget(); // puts back in the stream the last read character
if (c == '\'')
return is >> quoted(s, '\'', '\\'); // the first character of the cell is ' - read quoted
else
return getline(is, s, ','), is.unget(); // read unqoted, but put back comma - we need it later, in get function
}
vector<string> get(const string& s)
{
istringstream iss{ s };
string cell;
vector<string> r;
while (get_cell(iss, cell))
{
r.push_back( cell );
char comma;
iss >> comma; // expect a cell separator
if (comma != ',')
break; // cell separator not found; we are at the end of stream/string - break the loop
}
if (char c; iss >> c) // we reached the end of what we understand - probe the end of stream
throw "ill formed";
return r;
}
And this is how you use it:
int main()
{
string s = "1, 10, 'abc', 'test, 1'";
try
{
auto v = get(s);
}
catch (const char* e)
{
cout << e;
}
}

String Tokenizer with multiple delimiters including delimiter without Boost

I need to create string parser in C++. I tried using
vector<string> Tokenize(const string& strInput, const string& strDelims)
{
vector<string> vS;
string strOne = strInput;
string delimiters = strDelims;
int startpos = 0;
int pos = strOne.find_first_of(delimiters, startpos);
while (string::npos != pos || string::npos != startpos)
{
if(strOne.substr(startpos, pos - startpos) != "")
vS.push_back(strOne.substr(startpos, pos - startpos));
// if delimiter is a new line (\n) then add new line
if(strOne.substr(pos, 1) == "\n")
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne.substr(pos, 1) != " ")
vS.push_back(strOne.substr(pos, 1));
if( string::npos == strOne.find_first_not_of(delimiters, pos) )
startpos = strOne.find_first_not_of(delimiters, pos);
else
startpos = pos + 1;
pos = strOne.find_first_of(delimiters, startpos);
}
return vS;
}
This works for 2X+7cos(3Y)
(tokenizer("2X+7cos(3Y)","+-/^() \t");)
But gives a runtime error for 2X
I need non Boost solution.
I tried using C++ String Toolkit (StrTk) Tokenizer
std::vector<std::string> results;
strtk::split(delimiter, source,
strtk::range_to_type_back_inserter(results),
strtk::tokenize_options::include_all_delimiters);
return results;
but it doesn't give token as a separate string.
eg: if I give the input as 2X+3Y
output vector contains
2X+
3Y
What's probably happening is this is crashing when passed npos:
lastPos = str.find_first_not_of(delimiters, pos);
Just add breaks to your loop instead of relying on the while clause to break out of it.
if (pos == string::npos)
break;
lastPos = str.find_first_not_of(delimiters, pos);
if (lastPos == string::npos)
break;
pos = str.find_first_of(delimiters, lastPos);
Loop exit condition is broken:
while (string::npos != pos || string::npos != startpos)
Allows entry with, say pos = npos and startpos = 1.
So
strOne.substr(startpos, pos - startpos)
strOne.substr(1, npos - 1)
end is not npos, so substr doesn't stop where it should and BOOM!
If pos = npos and startpos = 0,
strOne.substr(startpos, pos - startpos)
lives, but
strOne.substr(pos, 1) == "\n"
strOne.substr(npos, 1) == "\n"
dies. So does
strOne.substr(pos, 1) != " "
Sadly I'm out of time and can't solve this right now, but QuestionC's got the right idea. Better filtering. Something along the lines of:
if (string::npos != pos)
{
if (strOne.substr(pos, 1) == "\n") // can possibly simplify this with strOne[pos] == '\n'
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne[pos] != ' ')
vS.push_back(strOne.substr(pos, 1));
}
Would be great if you could share some info on your environment. Your program ran fine with an input value of 2X on my Fedora 20 using g++.
I created a little function that splits a string into substrings (which are stored in a vector) and it allows you to set which characters you want to treat as whitespace. Normal whitespace will still be treated as whitespace, so you don't have to define that. Actually, all it does is turns the character you defined as whitespace into actual whitespace (space char ' '). Then it runs that in a stream (stringstream) to separate the substrings and store them in a vector. This may not be what you need for this particular problem, but maybe it can give you some ideas.
// split a string into its whitespace-separated substrings and store
// each substring in a vector<string>. Whitespace can be defined in argument
// w as a string (e.g. ".;,?-'")
vector<string> split(const string& s, const string& w)
{
string temp{ s };
// go through each char in temp (or s)
for (char& ch : temp) {
// check if any characters in temp (s) are whitespace defined in w
for (char white : w) {
if (ch == white)
ch = ' '; // if so, replace them with a space char (' ')
}
}
vector<string> substrings;
stringstream ss{ temp };
for (string buffer; ss >> buffer;) {
substrings.push_back(buffer);
}
return substrings;
}

c++ splitting string on non alphabetic characters

I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.
I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.\n";
}
ptr=NULL;
str.erase(0, pos);
}
}
My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")
Which is no good to me.
Found a way to use isalpha
template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.\n";
}
ptr=NULL;
str.erase(0, pos);
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!\n";
}
template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);
}
But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?
#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token, root);
cout << token << " added to tree.\n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!\n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced
Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .

Parse a file in c++ and ignore some characters

I'm trying to parse a file that has the following information
test_wall ; Comments!!
je,5
forward
goto,1
test_random;
je,9
I'm supposed to ignore comments after the ";" and move on to the next line. When there is a comma I'm trying to ignore the comma and store the second value.
string c;
int a;
c=getChar();
ifstream myfile (filename);
if (myfile.is_open())
{
while ( c != ';' && c != EOF)
{
c = getchar();
if( c == ',')
{
a= getChar();
}
}
}
myfile.close();
}
Here's some code. I'm not entirely sure I've understood the problem correctly, but if not hopefully this will set you on the right direction.
ifstream myfile (filename);
if (myfile.is_open())
{
// read one line at a time until EOF
string line;
while (getline(myFile, line))
{
// does the line have a semi-colon?
size_t pos = line.find(';');
if (pos != string::npos)
{
// remove the semi-colon and everything afterwards
line = line.substr(0, pos);
}
// does the line have a comma?
pos = line.find(',');
if (pos != string::npos)
{
// get everything after the comma
line = line.substr(pos + 1);
// store the string
...
}
}
}
I've left the section commented 'store the string' blank because I'm not certain what you want to do here. Possibly you are asking to convert the string into an integer before storing it. If so then add that code, or ask if you don't know how to do that. Actually don't ask, search on stack overflow, because that question has been asked hundreds of times.

Array subscript operator on Vectors

I was writing a code to tokenize a string wrt delimeters ",".
void Tokenize(const string& str, vector<string>& tokens, const string& delimeters)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
int main()
{
string str;
int test_case;
cin>>test_case;
while(test_case--)
{
vector<string> tokens;
getline(cin, str);
Tokenize(str, tokens, ",");
// Parsing the input string
cout<<tokens[0]<<endl;
}
return 0;
}
It gives segmentation fault on running. When I debugged it the line
cout<<tokens[0]<<endl
was the cause of problem.I can't understand why because at cplusplus.com it uses the [ ] operartors for accesing the values of vectors
cin>>test_case; // this leaves a newline in the input buffer
while(test_case--)
{
vector<string> tokens;
getline(cin, str); // already found newline
Tokenize(str, tokens, ","); // passing empty string
Without looking at your Tokenize function, I would guess that an empty string results in an empty vector, which means when you print tokens[0], that element does not actually exist. You need to make sure your input buffer is empty before you call getline. You could put a call to cin.ignore() right after your number input, for example.
You could also forgo operator>>, and only use getline. Then do string to number conversions with your favorite method.
Is it possible that the read using std::getline() wasn't successful? In this case the string would be empty and using the subscript operator would crash. You should always test whether reading was successful after trying to read, e.g.:
if (std::getline(std::cin, str)) {
// process the read string
}