C++ Reading file Tokens - c++

another request sorry..
Right now I am reading the tokens in one by one and it works, but I want to know when there is a new line..
if my file contains
Hey Bob
Now
should give me
Hey
Bob
[NEW LINE]
NOW
Is there a way to do this without using getline?

Yes the operator>> when used with string read 'white space' separated words. A 'White space' includes space tab and new line characters.
If you want to read a line at a time use std::getline()
The line can then be tokenized separately with a string stream.
std::string line;
while(std::getline(std::cin,line))
{
// If you then want to tokenize the line use a string stream:
std::stringstream lineStream(line);
std::string token;
while(lineStream >> token)
{
std::cout << "Token(" << token << ")\n";
}
std::cout << "New Line Detected\n";
}
Small addition:
Without using getline()
So you really want to be able to detect a newline. This means that newline becomes another type of token. So lets assume that you have words separated by 'white spaces' as tokens and newline as its own token.
Then you can create a Token type.
Then all you have to do is write the stream operators for a token:
#include <iostream>
#include <fstream>
class Token
{
private:
friend std::ostream& operator<<(std::ostream&,Token const&);
friend std::istream& operator>>(std::istream&,Token&);
std::string value;
};
std::istream& operator>>(std::istream& str,Token& data)
{
// Check to make sure the stream is OK.
if (!str)
{ return str;
}
char x;
// Drop leading space
do
{
x = str.get();
}
while(str && isspace(x) && (x != '\n'));
// If the stream is done. exit now.
if (!str)
{
return str;
}
// We have skipped all white space up to the
// start of the first token. We can now modify data.
data.value ="";
// If the token is a '\n' We are finished.
if (x == '\n')
{ data.value = "\n";
return str;
}
// Otherwise read the next token in.
str.unget();
str >> data.value;
return str;
}
std::ostream& operator<<(std::ostream& str,Token const& data)
{
return str << data.value;
}
int main()
{
std::ifstream f("PLOP");
Token x;
while(f >> x)
{
std::cout << "Token(" << x << ")\n";
}
}

I don't know why you think std::getline is bad. You can still recognize newlines.
std::string token;
std::ifstream file("file.txt");
while(std::getline(file, token)) {
std::istringstream line(token);
while(line >> token) {
std::cout << "Token :" << token << std::endl;
}
if(file.unget().get() == '\n') {
std::cout << "newline found" << std::endl;
}
}

This is another cool and much less verbose way I came across to tokenize strings.
vector<string> vec; //we'll put all of the tokens in here
string token;
istringstream iss("put text here");
while ( getline(iss, token, '\n') ) {
vec.push_back(token);
}

Related

Reading a CSV file detecting last field in file

I'm trying to read a CSV file and I have three fields that I'm supposed to read in and the very last field is an integer and I am crashing on the last line of the file with stoi function since there is no newline character and I am not sure how to detect when I am on the last line. The first two getline statements are reading the first two fields and my third getline is reading and expecting an integer and my delimiter for that one only is '\n' but this will not work for the very last line of input and I was wondering was there any workaround for this?
My field types that I am expecting are [ int, string, int ] and I have to include spaces with the middle field so I don't think using stringstream for that will be effective
while (! movieReader.eof() ) { // while we haven't readched end of file
stringstream ss;
getline(movieReader, buffer, ','); // get movie id and convert it to integer
ss << buffer; // converting id from string to integer
ss >> movieid;
getline(movieReader, movieName, ','); // get movie name
getline(movieReader, buffer, '\n');
pubYear = stoi(buffer); // buffer will be an integer, the publish year
auto it = analyze.getMovies().emplace(movieid, Movie(movieid, movieName, pubYear ) );
countMovies++;
}
For reading and writing objects one would idomatically overload the stream extraction and stream insertion operators:
Sample csv:
1, The Godfather, 1972
2, The Shawshank Redemption, 1994
3, Schindler's List, 1993
4, Raging Bull, 1980
5, Citizen Kane, 1941
Code:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
void skip_to(std::istream &is, char delim) // helper function to skip the rest of a line
{
char ch;
while ((ch = is.get()) && is && ch != delim);
}
std::istream& eat_whitespace(std::istream &is) // stream manipulator that eats up whitespace
{
char ch;
while ((ch = is.peek()) && is && std::isspace(static_cast<int>(ch)))
is.get();
return is;
}
class Movie
{
int movieid;
std::string movieName;
int pubYear;
friend std::istream& operator>>(std::istream &is, Movie &movie)
{
Movie temp; // use a temporary to not mess up movie with a half-
char ch; // extracted dataset if we fail to extract some field.
if (!(is >> temp.movieid)) // try to extract the movieid
return is;
if (!(is >> std::skipws >> ch) || ch != ',') { // read the next non white char
is.setf(std::ios::failbit); // and check its a comma
return is;
}
is >> eat_whitespace; // skip all whitespace before the movieName
if (!std::getline(is, temp.movieName, ',')) { // read the movieName up to the
return is; // next comma
}
if (!(is >> temp.pubYear)) // extract the pubYear
return is;
skip_to(is, '\n'); // skip the rest of the line (or till eof())
is.clear();
movie = temp; // all went well, assign the temporary
return is;
}
friend std::ostream& operator<<(std::ostream &os, Movie const &movie)
{
os << "Nr. " << movie.movieid << ": \"" << movie.movieName << "\" (" << movie.pubYear << ')';
return os;
}
};
int main()
{
char const * movies_file_name{ "foo.txt" };
std::ifstream is{ movies_file_name };
if (!is.is_open()) {
std::cerr << "Couldn't open \"" << movies_file_name << "\"!\n\n";
return EXIT_FAILURE;
}
std::vector<Movie> movies{ std::istream_iterator<Movie>{is},
std::istream_iterator<Movie>{} };
for (auto const & m : movies)
std::cout << m << '\n';
}
Output:
Nr. 1: "The Godfather" (1972)
Nr. 2: "The Shawshank Redemption" (1994)
Nr. 3: "Schindler's List" (1993)
Nr. 4: "Raging Bull" (1980)
Nr. 5: "Citizen Kane" (1941)

How to use getline to delimit by comma *and* <space> in c++?

chicken, for sale, 60
microwave, wanted, 201.
These are examples lines from my txt file. Right now this is my code:
while(getline(data, word, '\n')){
ss<<word;
while(getline(ss, word, ',')){//prints entire file
cout<<word<<endl;
}
}
and my output is:
chicken
for sale
60
My file is succesfully parsed line by line, but I also need to get rid of that space after each comma. Adding a space after the comma here just gives me an error "no matching function to call 'getline(...:
while(getline(ss, word, ', '))
SOLUTION: I just used the erase function
if(word[0]==' '){//eliminates space
word.erase(0,1);
}
getline just parses single parameter.
If you want to parse the multiple delimiters, You can use boost library.
std::string delimiters("|,:-;");
std::vector<std::string> parts;
boost::split(parts, inputString, boost::is_any_of(delimiters));
for(int i = 0; i<parts.size();i++ ) {
std::cout <<parts[i] << " ";
}
You could use std::ws to get rid of any leading whitespace of each part:
while(getline(ss >> std::ws, word, ','))
Try something like this:
std::string line;
std::string tok;
while (std::getline(data, line))
{
std::istringstream iss(line);
while (std::getline(iss >> std::ws, tok, ',')) {
tok.erase(tok.find_last_not_of(" \t\r\n") + 1);
std::cout << tok << std::endl;
}
}
Live demo
You can then wrap the above logic in a custom overloaded >> operator:
class token : public std::string {};
std::istream& operator>>(std::istream &in, token &out)
{
out.clear();
if (std::getline(in >> std::ws, out, ','))
out.erase(out.find_last_not_of(" \t\r\n") + 1);
return in;
}
std::string line;
token tok;
while (std::getline(data, line))
{
std::istringstream iss(line);
while (iss >> tok) {
std::cout << tok << std::endl;
}
}
Live demo
SOLUTION: I just used the erase function
if(word[0]==' '){//eliminates space
word.erase(0,1);
}

C++ Read file line by line then split each line using the delimiter

i have searched and got a good grasp of the solution from a previous post but I am still stuck a bit.
My problem is instead of "randomstring TAB number TAB number NL"
My data is "number (space colon space) number (space colon space) sentence"
I've edited the code below but still can't get it to work 100% because the parameters getline takes is (stream, string, delimiter).
For some reason, it only gets the first word of the sentence as well and not the rest.
Previous post
I want to read a txt file line by line and after reading each line, I want to split the line according to the tab "\t" and add each part to an element in a struct.
my struct is 1*char and 2*int
struct myStruct
{
char chr;
int v1;
int v2;
}
where chr can contain more than one character.
A line should be something like:
randomstring TAB number TAB number NL
SOLUTION
std::ifstream file("plop");
std::string line;
while(std::getline(file, line))
{
std::stringstream linestream(line);
std::string data;
int val1;
int val2;
// If you have truly tab delimited data use getline() with third parameter.
// If your data is just white space separated data
// then the operator >> will do (it reads a space separated word into a string).
std::getline(linestream, data, '\t'); // read up-to the first tab (discard tab).
// Read the integers using the operator >>
linestream >> val1 >> val2;
}
At the following code line, the data variable will hold the complete line. And linestream will be consumed, so further readings will not yield anything.
std::getline(linestream, data, '\t'); // read up-to the first tab (discard tab).
Instead, you can just work on the line like this
while (std::getline(file, line))
{
int token1 = std::stoi(line.substr(0, line.find(" : ")));
line.erase(0, line.find(" : ") + 3);
int token2 = std::stoi(line.substr(0, line.find(" : ")));
line.erase(0, line.find(" : ") + 3);
std::string token3 = line;
}
What exactly is your problem ?
//Title of this code
//clang 3.4
#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>
#include <sstream>
#include <vector>
struct Data
{
int n1;
int n2;
std::string sequence;
};
std::ostream& operator<<(std::ostream& ostr, const Data& data)
{
ostr << "{" << data.n1 << "," << data.n2 << ",\"" << data.sequence << "\"}";
return ostr;
}
std::string& ltrim(std::string& s, const char* t = " \t")
{
s.erase(0, s.find_first_not_of(t));
return s;
}
std::string& rtrim(std::string& s, const char* t = " \t")
{
s.erase(s.find_last_not_of(t) + 1);
return s;
}
std::string& trim(std::string& s, const char* t = " \t")
{
return ltrim(rtrim(s, t), t);
}
int main()
{
std::string file_content{
"1\t1\t\n"
"2\t2\tsecond sequence\t\n"
"3\t3\tthird sequence\n"};
std::istringstream file_stream{file_content};
std::string line;
std::vector<Data> content;
while(std::getline(file_stream, line))
{
std::istringstream line_stream{line};
Data data{};
if(!(line_stream >> data.n1 >> data.n2))
{
std::cout << "Failed to parse line (numbers):\n" << line << "\n";
break;
}
auto numbers_end = line_stream.tellg();
if(numbers_end == -1)
{
std::cout << "Failed to parse line (sequence):\n" << line << "\n";
break;
}
data.sequence = line.substr(numbers_end);
trim(data.sequence);
content.push_back(std::move(data));
}
std::copy(content.cbegin(), content.cend(),
std::ostream_iterator<Data>(std::cout, "\n"));
}
Live
Live with colons

How to save and restore an std::istringstream's buffer?

I am using a istringstream to read a string word by word. However, when my condition fails I need to be able to revert the istringstream to before the previous word was read. My example code works, but I want to know if there is a more direct way to use streams to accomplish this.
std::string str("my string");
std::istringstream iss(str);
std::ostringstream ossBackup << iss.rdbuf(); // Writes contents of buffer and in the process changes the buffer
std::string strBackup(ossBackup.str()); // Buffer has been saved as string
iss.str(strBackup); // Use string to restore iss's buffer
iss.clear(); // Clear error states
iss >> word; // Now that I have a backup read the 1st word ("my" was read)
// Revert the `istringstream` to before the previous word was read.
iss.str(strBackup); // Restore iss to before last word was read
iss.clear(); // Clear error states
iss >> word; // "my" was read again
You can use tellg() and seekg() to save and restore your position if you like:
#include <string>
#include <sstream>
int main()
{
std::istringstream iss("some text");
std::string word;
// save the position
std::streampos pos = iss.tellg();
// read a word
if(iss >> word)
std::cout << word << '\n';
iss.clear(); // clear eof or other errors
iss.seekg(pos); // move to saved position
while(iss >> word)
std::cout << word << '\n';
}
This is really only guaranteed to work for stringstream's, but you can repeatedly call unget() until you've reached a space character:
#include <iostream>
#include <sstream>
template <int n>
std::istream& back(std::istream& is)
{
bool state = is.good();
auto& f = std::use_facet<std::ctype<char>>(is.getloc());
for (int i = 0; i < n && is; ++i)
while (is.unget() && !f.is(f.space, is.peek()));
if (state && !is)
is.clear();
return is;
}
int main()
{
std::stringstream iss("hello world");
std::string str;
std::cout << "Value Before: ";
iss >> str;
std::cout << str << std::endl;
iss >> back<1>; // go back one word
std::cout << "Value after: ";
iss >> str;
std::cout << str;
}
Live Demo

C++ Read file line by line then split each line using the delimiter

I want to read a txt file line by line and after reading each line, I want to split the line according to the tab "\t" and add each part to an element in a struct.
my struct is 1*char and 2*int
struct myStruct
{
char chr;
int v1;
int v2;
}
where chr can contain more than one character.
A line should be something like:
randomstring TAB number TAB number NL
Try:
Note: if chr can contain more than 1 character then use a string to represent it.
std::ifstream file("plop");
std::string line;
while(std::getline(file, line))
{
std::stringstream linestream(line);
std::string data;
int val1;
int val2;
// If you have truly tab delimited data use getline() with third parameter.
// If your data is just white space separated data
// then the operator >> will do (it reads a space separated word into a string).
std::getline(linestream, data, '\t'); // read up-to the first tab (discard tab).
// Read the integers using the operator >>
linestream >> val1 >> val2;
}
Unless you intend to use this struct for C as well, I would replace the intended char* with std::string.
Next, as I intend to be able to read it from a stream I would write the following function:
std::istream & operator>>( std::istream & is, myStruct & my )
{
if( std::getline(is, my.str, '\t') )
return is >> my.v1 >> my.v2;
}
with str as the std::string member. This writes into your struct, using tab as the first delimiter and then any white-space delimiter will do before the next two integers. (You can force it to use tab).
To read line by line you can either continue reading these, or read the line first into a string then put the string into an istringstream and call the above.
You will need to decide how to handle failed reads. Any failed read above would leave the stream in a failed state.
std::ifstream in("fname");
while(in){
std::string line;
std::getline(in,line);
size_t lasttab=line.find_last_of('\t');
size_t firsttab=line.find_last_of('\t',lasttab-1);
mystruct data;
data.chr=line.substr(0,firsttab).c_str();
data.v1=atoi(line.substr(firsttab,lasttab).c_str());
data.v2=atoi(line.substr(lasttab).c_str());
}
I had some difficulty following some of the suggestions here, so I'm posting a complete example of overloading both input and output operators for a struct over a tab-delimited file. As a bonus, it also takes the input either from stdin or from a file supplied via the command arguments.
I believe this is about as simple as it gets while adhering to the semantics of the operators.
pairwise.h
#ifndef PAIRWISE_VALUE
#define PAIRWISE_VALUE
#include <string>
#include <iostream>
struct PairwiseValue
{
std::string labelA;
std::string labelB;
float value;
};
std::ostream& operator<<(std::ostream& os, const PairwiseValue& p);
std::istream& operator>>(std::istream& is, PairwiseValue& p);
#endif
pairwise.cc
#include "pairwise.h"
std::ostream& operator<<(std::ostream& os, const PairwiseValue& p)
{
os << p.labelA << '\t' << p.labelB << '\t' << p.value << std::endl;
return os;
}
std::istream& operator>>(std::istream& is, PairwiseValue& p)
{
PairwiseValue pv;
if ((is >> pv.labelA >> pv.labelB >> pv.value))
{
p = pv;
}
return is;
}
test.cc
#include <fstream>
#include "pairwise.h"
int main(const int argc, const char* argv[])
{
std::ios_base::sync_with_stdio(false); // disable synch with stdio (enables input buffering)
std::string ifilename;
if (argc == 2)
{
ifilename = argv[1];
}
const bool use_stdin = ifilename.empty();
std::ifstream ifs;
if (!use_stdin)
{
ifs.open(ifilename);
if (!ifs)
{
std::cerr << "Error opening input file: " << ifilename << std::endl;
return 1;
}
}
std::istream& is = ifs.is_open() ? static_cast<std::istream&>(ifs) : std::cin;
PairwiseValue pv;
while (is >> pv)
{
std::cout << pv;
}
return 0;
}
Compiling
g++ -c pairwise.cc test.cc
g++ -o test pairwise.o test.o
Usage
./test myvector.tsv
cat myvector.tsv | ./test