Tokenize a string with stringstream where the last char is the delimiter - c++

I am reading data from a file and putting it into string tokens like so:
std::vector<Mytype> mytypes;
std::ifstream file("file.csv");
std::string line;
while (std::getline(file, line)){
std::stringstream lineSs(line);
std::vector<std::string> tokens;
std::string token;
while (std::getline(lineSs, token, ',')){
tokens.push_back(token);
}
Mytype mytype(tokens[0], tokens[1], tokens[2], tokens[3]);
mytypes.push_back(mytype);
}
Obviously a pretty standard way of doing this. However the data has no NULL values, instead it will just be empty at that point. What I mean is the data may look something like this:
id0,1,2,3
id1,,2,
id2,,,3
The case of the middle line is causing me problems, because nothing is getting pushed back into my tokens vector after the "2", though there should be an empty string. Then I get some out_of_range problems when I try to create an instance of Mytype.
Until now I have been checking to see if the last char of each line is a comma, and if so, appending a space to the end of the line. But I was wondering if there was a better way to do this.
Thanks.

The difference is that line 2 has !lineSs.eof() before the last call to getline(). So you should stop the loop not if getline() returns false (note: this isn't really getline() returning false, but the stream being false when casted to bool); instead, stop it once lineSs.eof() returns true.
Here is a modification of your program that shows the idea:
int main() {
std::string line;
while (std::getline(std::cin, line)){
std::stringstream lineSs(line);
std::vector<std::string> tokens;
do {
std::string token;
std::getline(lineSs, token, ',');
tokens.push_back(token);
std::cout << "'" << token << "' " << lineSs.eof() << ' ' << lineSs.fail() << std::endl;
} while(!lineSs.eof());
std::cout << tokens.size() << std::endl;
}
}
It will show "3" on the last line for "1,2,3", and "4" for "1,2,3,".

A simple way to add a null string to the vector if the line ends with a comma is just to check for that before you create mytype. If you add
if (line.back() == ',')
tokens.push_back("");
After your inner while loop then this will add an empty string to tokens in the event that you end will a null column.
So
while (std::getline(lineSs, token, ',')){
tokens.push_back(token);
}
Becomes
while (std::getline(lineSs, token, ',')){
tokens.push_back(token);
}
if (line.back() == ',')
tokens.push_back("");

Related

Reading file line by line and tokenizing lines

I have a file with multiple lines.
lines contain integers separated by commas
In the following code it only parses the first line, but not the renaming lines. Any insight about I am doing wrong ?
void parseLines(std::ifstream &myfile){
std::string line, token;
std::stringstream ss;
int i;
vector<int> r;
while(myfile) {
getline(myfile, line);
ss.str("");
ss.str(line);
if(myfile){
cout << "SS" << ss.str() << endl;
while (std::getline(ss, token, ',')){
std::cout << token << std::endl;
}
}
}
}
Any insight about I am doing wrong?
The state of ss needs to be reset before data from the second line can be read.
Better yet, move the construction of ss inside the loop.
While at it,
replace while(myfile) by while(getline(myfile, line)).
Move the declaration of token inside the loop.
void parseLines(std::ifstream &myfile){
std::string line;
int i;
vector<int> r;
while( getline(myfile, line) )
{
std::stringstream ss(line);
std::string token;
while (std::getline(ss, token, ',')){
std::cout << token << std::endl;
}
}
}
The issue here is the stringstream is not local to the while loop. When you read from the stringstream the first time you exhaust the stream which causes the EOF flag to be set. If you do not clear that then you will never read any more information from it even if you load more. The simplest way to get around this is to make the stringstream local to the loop body so you start off with a fresh one on each iteration and you do no have to worry about cleaning up the flags. That would make your code look like
while(getline(myfile, line)) // also moved the line reading here to control when to stop the loop
{
std::stringstream ss(line);
while (std::getline(ss, token, ',')){
std::cout << token << std::endl;
}
}

General CSV Parser with multiple EOL characters

I'm trying to change this function to also account for when CSV files are given with \r endings. I can't seem to figure out how to get getline() take that into account.
vector<vector<string>> Parse::parseCSV(string file)
{
// input fstream instance
ifstream inFile;
inFile.open(file);
// check for error
if (inFile.fail()) { cerr << "Cannot open file" << endl; exit(1); }
vector<vector<string>> data;
string line;
while (getline(inFile, line))
{
stringstream inputLine(line);
char delimeter = ',';
string word;
vector<string> brokenLine;
while (getline(inputLine, word, delimeter)) {
word.erase(remove(word.begin(), word.end(), ' '), word.end()); // remove all white spaces
brokenLine.push_back(word);
}
data.push_back(brokenLine);
}
inFile.close();
return data;
};
This is a possible duplicate of Getting std :: ifstream to handle LF, CR, and CRLF?. The top answer is particularly good.
If you know every line ends with a \r you can always specify the getline delimiter with getline(input, data, '\r'), where input is an stream, data is a string, and the third parameter is the character to split by. You could also try something like the following after the start of the first while loop
// after the start of the first while loop
stringstream inputLine;
size_t pos = line.find('\r');
if(pos < line.size()) {
inputLine << std::string(x.begin(), x.begin() + p);
inputLine << "\n"
inputLine << std::string(x.begin() + p + 1, x.end());
} else {
inputLine << line;
}
// the rest of your code here

Parsing and adding string to vector

I have the following string "0 1 2 3 4 "(There is a space at the end of the string). Which i would like to split and add to a vector of string. When i use a loop and a stringstream, the program loops itself into a infinity loop with the last number 4. It does not want to stop.
How can I split the following and add to a vector of strings at the same time.
Please advcie.
stringstream ss(currentLine);
for(int i=0;i<strlen(currentLine.c_str());i++){
ss>>strCode;
strLevel.push_back(strCode);
}
std::ifstream infile(filename.c_str());
std::string line;
if (infile.is_open())
{
std::cout << "Well done! File opened successfully." << std::endl;
while (std::getline(infile, line))
{
std::istringstream iss(line);
std::vector<std::string> tokens { std::istream_iterator<std::string>(iss), std::istream_iterator<std::string>() };
for (auto const &token : tokens)
if (!token.compare("your_value"))
// Do something....
}
}
First of all, we read a line just by using std::istringstream iss(line), then we split words according to the whitespaces and store them inside the tokens vector.
Update: thanks to Nawaz for improvement suggestions (see comments).
stringstream ss(currentLine);
while ( ss >> strCode )
strLevel.push_back(strCode);
That should be enough.

Modify cin to also return the newlines

I know about getline() but it would be nice if cin could return \n when encountered.
Any way for achieving this (or similar)?
edit (example):
string s;
while(cin>>s){
if(s == "\n")
cout<<"newline! ";
else
cout<<s<<" ";
}
input file txt:
hola, em dic pere
caram, jo també .
the end result shoud be like:
hola, em dic pere newline! caram, jo també .
If you are reading individual lines, you know that there is a newline after each read line. Well, except for the last line in the file which doesn't have to be delimited by a newline character for the read to be successful but you can detect if there is newline by checking eof(): if std::getline() was successful but eof() is set, the last line didn't contain a newline. Obviously, this requires the use of the std::string version of std::getline():
for (std::string line; std::getline(in, line); )
{
std::cout << line << (in.eof()? "": "\n");
}
This should write the stream to std::cout as it was read.
The question asked for the data to be output but with newlines converted to say "newline!". You can achieve this with:
for (std::string line; std::getline(in, line); )
{
std::cout << line << (in.eof()? "": "newline! ");
}
If you don't care about the stream being split into line but actually just want to get the entire file (including all newlines), you can just read the stream into a std::string:
std::string file((std::istreambuf_iterator<char>(in)),
std::istreambuf_iterator<char>());
Note, however, that this exact approach is probably fairly slow (although I know that it can be made fast). If you know that the file doesn't contain a certain character, you can also use std::getline() to read the entire file into a std::string:
std::getline(in, file, 0);
The above code assumes that your file doesn't contain any null characters.
A modification of #Dietmar's answer should do the trick:
for (std::string line; std::getline(in, line); )
{
std::istringstream iss(line);
for (std::string word; iss >> word; ) { std::cout << word << " "; }
if (in.eof()) { std::cout << "newline! "; }
}
Just for the record, I ended up using this (I wanted to post it 11h ago)
string s0, s1;
while(getline(cin,s0)){
istringstream is(s0);
while(is>>s1){
cout<<s1<<" ";
}
cout<<"newline! ";
}

How do I access individual words after splitting a string?

std::string token, line("This is a sentence.");
std::istringstream iss(line);
getline(iss, token, ' ');
std::cout << token[0] << "\n";
This is printing individual letters. How do I get the complete words?
Updated to add:
I need to access them as words for doing something like this...
if (word[0] == "something")
do_this();
else
do_that();
std::string token, line("This is a sentence.");
std::istringstream iss(line);
getline(iss, token, ' ');
std::cout << token << "\n";
To store all the tokens:
std::vector<std::string> tokens;
while (getline(iss, token, ' '))
tokens.push_back(token);
or just:
std::vector<std::string> tokens;
while (iss >> token)
tokens.push_back(token);
Now tokens[i] is the ith token.
You would first have to define what makes a word.
If it's whitespace, iss >> token is the default option:
std::string line("This is a sentence.");
std::istringstream iss(line);
std::vector<std::string> words.
std::string token;
while(iss >> token)
words.push_back(token);
This should find
This
is
a
sentence.
as words.
If it's anything more complicated than whitespace, you will have to write your own lexer.
Your token variable is a String, not an array of strings. By using the [0], you're asking for the first character of token, not the String itself.
Just print token, and do the getline again.
You've defined token to be a std::string, which uses the index operator [] to return an individual character. In order to output the entire string, avoid using the index operator.