Read a file line-by-line twice using stringstream - c++

I Need to read a file line-by-line twice. The file content is expected to fit into memory. So, I would normally read the whole file into a buffer and work with that buffer afterwards.
However, since I would like to use std::getline, I need to work with a std::basic_istream. So, I thought it would be a good idea to write
std::ifstream file(filepath);
std::stringstream ss;
ss << file.rdbuf();
for (std::string line; std::getline(ss, line);)
{
}
However, I'm not sure what exactly is happening here. I guess ss << file.rdbuf(); does not read the file into any internal buffer of ss. Actual file access should occure only at std::getline(ss, line);.
So, with a second for-loop of the provided form, I should end in reading the whole file once again. That's inefficient.
Am I correct and hence need to come up with an other approach?

I guess ss << file.rdbuf(); does not read the file into any internal
buffer of ss. Actual file access should occure only at
std::getline(ss, line);.
This is incorrect. cppreference.com has this to say about that operator<< overload:
basic_ostream& operator<<( std::basic_streambuf<CharT, Traits>* sb); (9)
9) Behaves as an UnformattedOutputFunction. After constructing and checking the sentry object, checks if sb is a null pointer. If it is, executes setstate(badbit) and exits. Otherwise, extracts characters from the input sequence controlled by sb and inserts them into *this until one of the following conditions are met:
end-of-file occurs on the input sequence;
inserting in the output sequence fails (in which case the character to be inserted is not extracted);
an exception occurs (in which case the exception is caught).
If no characters were inserted, executes setstate(failbit). If an exception was thrown while extracting, sets failbit and, if failbit is set in exceptions(), rethrows the exception.
So your assumption is incorrect. The entire contents of file is copied to the buffer controlled by ss, so reading from ss does not access the filesystem. You can freely read through ss and seek back to the beginning as many times as you like without incurring the overhead of re-reading the file each time.

After the first loop, clear the EOF and fail bits and go back to the beginning of the stringstream with:
ss.clear();
ss.seekg(0, std::ios::beg);

Am I correct and hence need to come up with an other approach?
You're not correct. The "hense" is unwarranted also. There's not enough info in the question, but I suspect the problem has nothing to do with using a stream buffer.
Without knowing what that first "garbage" character is, I cannot say for sure, but I suspect the file is in a wide-character unicode format, and you are using access operations that do not work on wide characters. If that is the case, buffering the file has nothing to do with the problem.
As an experiment, try the following. Mind the w's.
std::wifstream file(filepath);
std::wstringstream ss;
ss << file.rdbuf();
for (int i = 0; i < 42; ++i) {
wchar_t ch;
ss >> ch;
std::cout << static_cast<unsigned>(ch) << ' ';
}
It would not surprise me if the first four numbers are 255 254 92 0, or 255 254 47 0.
This might help: Problem using getline with unicode files

Related

Using stringstream object multiple times

I am finding it difficult to wrap my head around working of stringstream. Why the second while loop in the below code does not work? If stream object is getting emptied at the end of the first while loop is there any workaround to restore it back to initial condition?
// input is string of numbers separated by spaces (eg. "22 1 2 4")
std::string input;
std::getline(std::cin, input);
stringstream stream (input);
// print individual numbers
while (stream >> n)
{
cout << n << endl;
}
// print individual numbers again
while (stream >> n)
{
cout << n << endl;
}
stringstream is a subclass of istream, so stream >> n (std::istream::operator>>) returns a reference to istream
stream can be converted to bool (std::ios::operator bool): it converts to false when it no longer has any data (reached end-of-file)
You have finished reading stream in your first loop - it no longer has any data.
If stream object is getting emptied at the end of the first while loop is there any workaround to restore it back to initial condition?
You need to store values on your own and then reuse them - copying streams is not allowed (it doesn't make sense for them really) - Why copying stringstream is not allowed?
It's not emptied, but once you reach the end, you're stuck at the end – just like with other streams.
You need to clear the error flags (stream.clear()) and then either rewind (stream.seekg(0)) or reset the input string (stream.str(input)).
You need to create the stringstream first, in order to make multiple passes over what you have read into input. input itself is just a string not stream. #include <sstream> and then after reading input create the stringstream with:
std::stringstream stream (input);
You can then read with your first while loop, but the second while will not work because the stream position is left at the end of the stringsteam after the first while and eofbit is set.
You need call stream.seekg(0) to "rewind" the file and clear eofbit, see: std::basic_istream::seekg before the second while loop.

C++ Read in file with only numbers (doubles)

I'm trying to read in a file that should contain only numbers in it. I can successfully read in the entire file if it meets that criteria, but if it so happened to have a letter in it, I need to return false with an error statement.
The problem is I'm finding it hard for my program to error when it finds this character. It can find it no problem, but when it does, it decides to just skip over it.
My code to read in the file and attempt to read in only numbers:
bool compute::Read (ifstream& stream)
{
double value;
string line;
int lineNumber = 1;
if (stream)
{
while (getline(stream, line))
{
lineNumber++;
istringstream strStream(line);
while (strStream >> value)
{
cout << value << endl;
}
}
}
return true;
}
The input file which I use for this is
70.5 61.2 A8 10.2
2
Notice that there is a non-number character in my input file. It should fail and return false at that point.
Currently, all it does is once it hits the "A", it simply returns to the next line, continuing the getline while loop.
Any help with this would be much appreciated.
The stringstream does catch those errors, but you're doing nothing to stop the enclosing loop from continuing when an error is found. You need to tailor your main loop so that it stops when the stringstream finds an error, which you can't do if the stringstream is being reconstructed on each iteration. You should create a for() loop instead and construct the stringstream in the declaration part. And the condition to the loop should be "as long as the stringstream and stream do not catch an error". For example:
for (std::istringstream iss; iss && std::getline(stream, line);)
{
iss.clear();
iss.str(line);
while (iss >> value)
{
std::cout << value << '\n';
}
}
Futhermore, it doesn't look like you need to use std::getline() or std::istringstream if you just want to print each value. Just do:
while (stream >> value) {
std::cout << value << '\n';
}
The above will stop when it finds an invalid character for a double.
You need the code to stop streaming but return false if it hasn't yet reached the end of the "input".
One way, possibly not the most efficient but still one way, to do that is parse a word at a time.
If you read first into a std::string and if it works (so the string is not empty) create an istringstream from that string, or reuse an existing one, and try streaming that into a double value.
If that fails, you have an invalid character.
Of course you can read a line at a time from the file, then split that into words, so that you can output a meaningful error message showing what line the bad text was found.
The issue of reading straight into doubles is that the stream will fail when it reaches end of file.
However it is possible to workaround that too because the reason for failing has an error status which you can check, i.e. you can check if it eofbit is set. Although the f in eofbit stands for "file" it applies to any stream not just files.
Although this method may sound better than reading words into a string first, I prefer that method in normal circumstances because you want to be able to report the error so you'll want to print in the error what was read.

What's the difference between read, readsome, get, and getline?

What is the difference between these functions. When I use them they all do the same thing. For example all three calls return "hello":
#include <iostream>
#include <sstream>
int main()
{
stringstream ss("hello");
char x[10] = {0};
ss.read(x, sizeof(x)); // #1
std::cout << x << std::endl;
ss.clear();
ss.seekg(0, ss.beg);
ss.readsome(x, sizeof(x)); // #2
std::cout << x << std::endl;
ss.clear();
ss.seekg(0, ss.beg);
ss.get(x, sizeof(x)); // #3
std::cout << x;
ss.clear();
ss.seekg(0, ss.beg);
ss.getline(x, sizeof(x)); // #4
std::cout << x << std:endl;
}
get and getline are quite similar, when get is called with parameters ( char_type* s, std::streamsize count ). However, get reads from the stream until a delimiter is found, and then leaves it there. getline by comparison will pull the delimiter off the stream, but then drop it. It won't be added to the buffer it fills.
get looks for \n, and when a specific number of characters is provided in an argument (say, count) it will read up to count - 1 characters before stopping. read will pull in all count of them.
You could envisage read as being an appropriate action on a binary datasource, reading a specific number of bytes. get would be more appropriate on a text stream, when you're reading into a string that you'd like null-terminated, and where things like newlines have useful syntactic meanings splitting up text.
readsome only returns characters that are immediately available in the underlying buffer, something which is a bit nebulous and implementation specific. This probably includes characters returned to the stream using putback, for example. The fact that you can't see the difference between read and readsome just shows that the two might share an implementation on the particular stream type and library you are using.
I've observed the difference between read() and readsome() on a flash filing system.
The underlying stream reads 8k blocks and the read method will go for the next block to satisfy the caller, whereas the readsome method is allowed to return less than the request in order to avoid spending time fetching the next block.
The main difference between get() and getline() is that get() leaves the newline character in the input stream, making it the first character seen by the next input operation, whereas getline() extracts and discards the newline character from the input stream.

Limiting input size with std::setw in std::cin

Let's say I have sample code:
std::string s;
std::cin >> std::setw(4) >> s;
std::cout << s;
Now for input abcdef the result will be abc and for abc it will be abc too. The question is how can I check whether the string was split in the middle due to the limit or the result string is the actual one? I need to know whether the input fits or some data was skipped.
Although I know that the stream's width is considered when reading into a char* I wasn't aware that it is also considered when reading into a std::string. Assuming it is, reading would stop under three conditions:
The stream is completely read in which case eof() is set.
The next character is a space.
The number of characters which need to be read are read.
That is, you can check in.eof() and std::isspace(in.peek(). Well, is there is a funny std::ctype<char> facet used by the stream you'd really need to use
std::isspace(in.getloc(),
std::char_traits<char>::to_char_type(in.peek()));

C++ fstream: how to know size of string when reading?

...as someone may remember, I'm still stuck on C++ strings. Ok, I can write a string to a file using a fstream as follows
outStream.write((char *) s.c_str(), s.size());
When I want to read that string, I can do
inStream.read((char *) s.c_str(), s.size());
Everything works as expected. The problem is: if I change the length of my string after writing it to a file and before reading it again, printing that string won't bring me back my original string but a shorter/longer one. So: if I have to store many strings on a file, how can I know their size when reading it back?
Thanks a lot!
You shouldn’t be using the unformatted I/O functions (read() and write()) if you just want to write ordinary human-readable string data. Generally you only use those functions when you need to read and write compact binary data, which for a beginner is probably unnecessary. You can write ordinary lines of text instead:
std::string text = "This is some test data.";
{
std::ofstream file("data.txt");
file << text << '\n';
}
Then read them back with getline():
{
std::ifstream file("data.txt");
std::string line;
std::getline(file, line);
// line == text
}
You can also use the regular formatting operator >> to read, but when applied to string, it reads tokens (nonwhitespace characters separated by whitespace), not whole lines:
{
std::ifstream file("data.txt");
std::vector<std::string> words;
std::string word;
while (file >> word) {
words.push_back(word);
}
// words == {"This", "is", "some", "test", "data."}
}
All of the formatted I/O functions automatically handle memory management for you, so there is no need to worry about the length of your strings.
Although your writing solution is more or less acceptable, your reading solution is fundamentally flawed: it uses the internal storage of your old string as a character buffer for your new string, which is very, very bad (to put it mildly).
You should switch to a formatted way of reading and writing the streams, like this:
Writing:
outStream << s;
Reading:
inStream >> s;
This way you would not need to bother determining the lengths of your strings at all.
This code is different in that it stops at whitespace characters; you can use getline if you want to stop only at \n characters.
You can write the strings and write an additional 0 (null terminator) to the file. Then it will be easy to separate strings later. Also, you might want to read and write lines
outfile << string1 << endl;
getline(infile, string2, '\n');
If you want to use unformatted I/O your only real options are to either use a fixed size or to prepend the size somehow so you know how many characters to read. Otherwise, when using formatted I/O it somewhat depends on what your strings contain: if they can contain all viable characters, you would need to implement some sort of quoting mechanism. In simple cases, where strings consist e.g. of space-free sequence, you can just use formatted I/O and be sure to write a space after each string. If your strings don't contain some character useful as a quote, it is relatively easy to process quotes:
std::istream& quote(std::istream& out) {
char c;
if (in >> c && c != '"') {
in.setstate(std::ios_base::failbit;
}
}
out << '"' << string << "'";
std::getline(in >> std::ws >> quote, string, '"');
Obviously, you might want to bundle this functionality a class.