Why does writing empty std::istringstream.rdbuf() set failbit? - c++

I have learned, that I can copy a C++ std::istream to an C++ std::ostream by outputting the istreams' rdbuf(). I used it several times and it worked fine.
Today I got in trouble, because this operation sets badbit, if the std::istream is empty (at least for std::istringstream). I have written the following code to demonstrate my problem:
#include <stdio.h>
#include <sstream>
int main(int argc, char *argv[])
{
std::ostringstream ss;
ss << std::istringstream(" ").rdbuf(); // this does not set failbit
printf("fail=%d\n", ss.fail());
ss << std::istringstream("").rdbuf(); // why does this set failbit ???
printf("fail=%d\n", ss.fail());
}
I tried Windows/VS2017 and Linux/gcc-9.20 and both behave the same.
I am using std::istream& in a method signature with a default value of std::istringstream(""). The calling code shall be able to pass an optional istream, which is appended to some other data.
Can anybody explain, why badbit is set?
Is there better way to implement this optional std::istream& parameter?
I know, I could write two methods, one with an additional std::istream& parameter, but I want to avoid duplicate code.
Thanks in advance,
Mario
Update 22-Apr-20
I now use the following code:
#include <stdio.h>
#include <sstream>
int main(int argc, char *argv[])
{
std::ostringstream out;
std::istringstream in("");
while (in)
{
char Buffer[4096];
in.read(Buffer, sizeof(Buffer));
out.write(Buffer, in.gcount());
}
printf("fail=%d\n", out.fail());
}
I also added a warning about setting failbit when copying empty files to https://stackoverflow.com/a/10195497/6832488

The documentation of ostream::operator<< describes the following behavior for reading streams:
Behaves as an UnformattedOutputFunction. After constructing and checking the sentry object, checks if sb is a null pointer. If it is, executes setstate(badbit) and exits. Otherwise, extracts characters from the input sequence controlled by sb and inserts them into *this until one of the following conditions are met:
* end-of-file occurs on the input sequence;
* inserting in the output sequence fails (in which case the character to be inserted is not extracted);
* an exception occurs (in which case the exception is caught).
If no characters were inserted, executes setstate(failbit). If an exception was thrown while extracting, sets failbit and, if failbit is set in exceptions(), rethrows the exception.
As you can tell, that explicitly says that trying to insert an empty buffer will set the failbit. If you want to essentially make it "optional", just check that the stream is good before inserting the buffer and do ss.clear() afterwards to clear the failbit.

Related

Why does std::ios_base::ignore() set the EOF bit?

When I read all data from a stream, but make no attempt to read past its end, the stream's EOF is not set. That's how C++ streams work, right? It's the reason this works:
#include <sstream>
#include <cassert>
char buf[255];
int main()
{
std::stringstream ss("abcdef");
ss.read(buf, 6);
assert(!ss.eof());
assert(ss.tellg() == 6);
}
However, if instead of read()ing data I ignore() it, EOF is set:
#include <sstream>
#include <cassert>
int main()
{
std::stringstream ss("abcdef");
ss.ignore(6);
assert(!ss.eof()); // <-- FAILS
assert(ss.tellg() == 6); // <-- FAILS
}
This is on GCC 4.8 and GCC trunk (Coliru).
It also has the unfortunate side-effect of making tellg() return -1 (because that's what tellg() does), which is annoying for what I'm doing.
Is this standard-mandated? If so, which passage and why? Why would ignore() attempt to read more than I told it to?
I can't find any reason for this behaviour on cppreference's ignore() page. I can probably .seekg(6, std::ios::cur) instead, right? But I'd still like to know what's going on.
I think this is a libstdc++ bug (42875, h/t NathanOliver). The requirements on ignore() in [istream.unformatted] are:
Characters are extracted until any
of the following occurs:
— n != numeric_limits<streamsize>::max() (18.3.2) and n characters have been extracted so far
— end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit),
which may throw ios_base::failure (27.5.5.4));
— traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character
c (in which case c is extracted).
Remarks: The last condition will never occur if traits::eq_int_type(delim, traits::eof()).
So we have two conditions (the last is ignored) - we either read n characters, or at some point we hit end-of-file in which case we set the eofbit. But, we are able to read n characters from the stream in this case (there are in fact 6 characters in your stream), so we will not hit end-of-file on the input sequence.
In libc++, eof() is not set and tellg() does return 6.

Taking input from cin and storing it in a char variable

I'm taking input using cin and storing it into a char variable. My question is if there is any input that could cause cin.fail() to return true.
I know that trying to store input such as "foo" into an int variable will fail, but is there any case in which this is possible with a char variable?
The overloads for operator>> which take a char follow the normal behavior of a formatted input function, that is they call rdbuf()->sbumpc() or rdbuf()->sgetc() to perform the extraction. Naturally, if eof is encountered, then eofbit is set. If one of the functions throw an exception, then badbit is set. If either of these are set, then failbit is set. There's no evidence to indicate that the operation would fail otherwise. (This is covered under section [istream] in the C++11 draft standard.) For other types, like int, do_get() is used to convert the character (similar to scanf). Of course, the conversion can fail, but no conversion is needed if the input is already a char.
Now the comments are misleading. CTRL+C would kill the application in Linux. CTRL+Z would send a character that signals EOF on some operating systems.
You can even use an emoji and it would work:
#include <iostream>
int main()
{
char c;
if (std::cin >> c)
std::cout << "Huzzah!";
}
With input 😁 outputs "Huzzah!" as expected.
I guess not because a char variable with just take only the first character from the given input, no matter how long the input or of what type it is (int,long,double..)
No, failbit is only set if there's a logical error reading the input stream, AKA, someone rips out the USB flashdrive containing the file from which you're reading. ;)

Behavior of fstream as a bool and fstream.good() function

I recently used fstream for a homework assignment and I was wondering about how two things worked.
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char** argv) {
ifstream myFile;
myFile.open("fileone.txt");
int myInt = 0;
while (myFile.good()) { // What is the difference between myFile and myFile.good()?
if (!myFile.eof()){
myFile >> myInt;
cout << myInt << endl;
}
}
return 0;
}
This is a snippet of my actual code I am working on. In another post, someone said that if I used while(myFile) , it would automatically convert into a bool. What is the difference between using this and using the member function .good() of the ifstream class? I know that .good() breaks out of the while loop when I reach the end of the text file but how does using the stream name behave?
IOStream classes have 4 functions for assessing the stream state: good(), bad(), fail(), and eof(). Excluding good(), each function checks a single bit in the underlying stream state and returns whether or not the bit is on (are there errors?). good() in particular checks if all the bits are off (is the stream valid?). These are what they are for:
good(): The stream has not encountered an error.
bad(): The stream has encountered an error that effects the integrity of the stream (i.e memory allocation failure, no buffer, etc.)
fail(): Typically a recoverable error (formatting/parsing failure).
eof(): The end-of-file (EOF) character has been reached.
When performing I/O, it is integral that you check for errors in the stream while processing input. What novices typically don't know is that the only function that was meant to be used to check for valid input is fail(). All the other functions are useful in other cases but not for conditioning input.
Futhermore, novices also fail to realize that input must be performed before checking for errors. Doing otherwise allows an unchecked extraction, allowing the body of the loop to access the value that was not produced from a valid extraction.
Streams have a boolean operator that returns !fail(), this allows you to check the stream in an elegant way after performing input, like this:
while (myFile >> myInt) {
// ...
}
This is the best way to perform input. The extraction itself should be present within a conditional context so that the body of whatever its being used in is executed only if the extraction succeeded.
Read the manual.
The bool conversion is defined so that the following are the same:
if (stream) { ... }
if (!stream.fail()) { ... }
There is a difference between stream.good() and !stream.fail(): !fail is also true the end of file.
And one more big issue with your code: you should check if the read is successful before using the input. So this is really bad:
myFile >> myInt;
cout << myInt << endl
because you have not checked if you really succeeded to read an int into myInt.
TLDR:
Use this for reading ints from a file:
while (myFile >> myInt) {
cout << myInt << endl;
}
Reason: myFile >> myInt returns myFile so it will invoke the bool conversion which should be used as the loop condition.

When does `ifstream::readsome` set `eofbit`?

This code loops forever:
#include <iostream>
#include <fstream>
#include <sstream>
int main(int argc, char *argv[])
{
std::ifstream f(argv[1]);
std::ostringstream ostr;
while(f && !f.eof())
{
char b[5000];
std::size_t read = f.readsome(b, sizeof b);
std::cerr << "Read: " << read << " bytes" << std::endl;
ostr.write(b, read);
}
}
It's because readsome is never setting eofbit.
cplusplus.com says:
Errors are signaled by modifying the internal state flags:
eofbit The get pointer is at the end of the stream buffer's internal input
array when the function is called, meaning that there are no positions to be
read in the internal buffer (which may or not be the end of the input
sequence). This happens when rdbuf()->in_avail() would return -1 before the
first character is extracted.
failbit The stream was at the end of the source of characters before the
function was called.
badbit An error other than the above happened.
Almost the same, the standard says:
[C++11: 27.7.2.3]: streamsize readsome(char_type* s, streamsize n);
32. Effects: Behaves as an unformatted input function (as described in
27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls
setstate(failbit) which may throw an exception, and return. Otherwise extracts
characters and stores them into successive locations of an array whose first
element is designated by s. If rdbuf()->in_avail() == -1, calls
setstate(eofbit) (which may throw ios_base::failure (27.5.5.4)), and extracts
no characters;
If rdbuf()->in_avail() == 0, extracts no characters
If rdbuf()->in_avail() > 0, extracts min(rdbuf()->in_avail(),n)).
33. Returns: The number of characters extracted.
That the in_avail() == 0 condition is a no-op implies that ifstream::readsome itself is a no-op if the stream buffer is empty, but the in_avail() == -1 condition implies that it will set eofbit when some other operation has led to in_avail() == -1.
This seems like an inconsistency, even despite the "some" nature of readsome.
So what are the semantics of readsome and eof? Have I interpreted them correctly? Are they an example of poor design in the streams library?
(Stolen from the [IMO] invalid libstdc++ bug 52169.)
I think this is a customization point, not really used by the default stream implementations.
in_avail() returns the number of chars it can see in the internal buffer, if any. Otherwise it calls showmanyc() to try to detect if chars are known to be available elsewhere, so a buffer fill request is guaranteed to succeed.
In turn, showmanyc() will return the number of chars it knows about, if any, or -1 if it knows that a read will fail, or 0 if it doesn't have a clue.
The default implementation (basic_streambuf) always returns 0, so that is what you get unless you have a stream with some other streambuf overriding showmanyc.
Your loop is essentially read-as-many-chars-as-you-know-is-safe, and it gets stuck when that is zero (meaning "not sure").
I don't think that readsome() is meant for what you're trying to do (read from a file on disk)... from cplusplus.com:
The function is intended to be used to read binary data from certain
types of asynchronic sources that may wait for more characters, since
it stops reading when the local buffer exhausts, avoiding potential
unexpected delays.
So it sounds like readsome() is intended for streams from a network socket or something like that, and you probably want to just use read().
If no character is available (i.e. gptr() == egptr() for the std:streambuf) the virtual member function showhowmanyc() is called. I could have an implementation of showmanyc() which returns an error code. Why that may be useful is a different question. However, this could set eof(). Of course, in_avail() is meant not to fail and not to block and just return the characters known to be available. That is, the loop you have above is essentially guaranteed to be an infinite loop unless you have a rather odd stream buffer.
Others have answered why readsome won't set eofbit by design. I will suggest a way to read some bytes until eof without setting fail bit in a intuitive way, in the same way you were trying to use readsome. This is the result of research in another question.
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;
streamsize Read(istream &stream, char *buffer, streamsize count)
{
// This consistently fails on gcc (linux) 4.8.1 with failbit set on read
// failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
streamsize reads = stream.rdbuf()->sgetn(buffer, count);
// This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
// failure of the previous sgetn()
stream.rdstate();
// On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
// sets eofbit when stream is EOF for the conseguences of sgetn(). It
// should also throw if exceptions are set, or return on the contrary,
// and previous rdstate() restored a failbit on Windows. On Windows most
// of the times it sets eofbit even on real read failure
stream.peek();
return reads;
}
int main(int argc, char *argv[])
{
ifstream instream("filepath", ios_base::in | ios_base::binary);
while (!instream.eof())
{
char buffer[0x4000];
size_t read = Read(instream, buffer, sizeof(buffer));
// Do something with buffer
}
}

Semantics of flags on basic_ios

I find myself repeatedly baffled by the rdstate() flags - good(), bad(), eof(), fail() - and how they are expressed in basic_ios::operator!, operator bool and operator void*.
Could somebody put me out of my misery and explain this so I never have to think twice again?
There are three flags that indicate error state:
badbit means something has gone very wrong with the stream. It might be a buffer error or an error in whatever is feeding data to the stream. If this flag is set, it's likely that you aren't going to be using the stream anymore.
failbit means that an extraction or a read from the stream failed (or a write or insertion for output streams) and you need to be aware of that failure.
eofbit means the input stream has reached its end and there is nothing left to read. Note that this is set only after you attempt to read from an input stream that has reached its end (that is, it is set when an error occurs because you try to read data that isn't there).
The failbit may also be set by many operations that reach EOF. For example, if there is only whitespace left remaining in the stream and you try to read an int, you will both reach EOF and you will fail to read the int, so both flags will be set.
The fail() function tests badbit || failbit.
The good() function tests !(badbit || failbit || eofbit). That is, a stream is good when none of the bits are set.
You can reset the flags by using the ios::clear() member function; this allows you to set any of the error flags; by default (with no argument), it clears all three flags.
Streams do not overload operator bool(); operator void*() is used to implement a somewhat broken version of the safe bool idiom. This operator overload returns null if badbit or failbit is set, and non-null otherwise. You can use this to support the idiom of testing the success of an extraction as the condition of a loop or other control flow statement:
if (std::cin >> x) {
// extraction succeeded
}
else {
// extraction failed
}
The operator!() overload is the opposite of the operator void*(); it returns true if the badbit or failbit is set and false otherwise. The operator!() overload is not really needed anymore; it dates back to before operator overloads were supported completely and consistently (see sbi's question "Why does std::basic_ios overload the unary logical negation operator?").
C++0x fixes the problem that causes us to have to use the safe bool idiom, so in C++0x the basic_ios base class template does overload operator bool() as an explicit conversion operator; this operator has the same semantics as the current operator void*().
In addition to James' answer, it's important to remember that these flags indicate results of operations, so won't be set unless you perform one.
A common error is to do this:
#include <fstream>
#include <iostream>
#include <string>
int main()
{
std::ifstream file("main.cpp");
while (!file.eof()) // while the file isn't at eof...
{
std::string line;
std::getline(file, line); // ...read a line...
std::cout << "> " << line << std::endl; // and print it
}
}
The problem here is that eof() won't be set until after we try to get the last line, at which point the stream will say "nope, no more!" and set it. This means the "correct" way is:
#include <fstream>
#include <iostream>
#include <string>
int main()
{
std::ifstream file("main.cpp");
for (;;)
{
std::string line;
std::getline(file, line); // read a line...
if (file.eof()) // ...and check if it we were at eof
break;
std::cout << "> " << line << std::endl;
}
}
This places the check in the correct location. This is very unruly though; luckily for us, the return value for std::getline is the stream, and the stream has a conversion operator that allows it to be tested in a boolean context, with the value of fail(), which includes eof(). So we can just write:
#include <fstream>
#include <iostream>
#include <string>
int main()
{
std::ifstream file("main.cpp");
std::string line;
while (std::getline(file, line)) // get line, test if it was eof
std::cout << "> " << line << std::endl;
}