Semantics of flags on basic_ios - c++

I find myself repeatedly baffled by the rdstate() flags - good(), bad(), eof(), fail() - and how they are expressed in basic_ios::operator!, operator bool and operator void*.
Could somebody put me out of my misery and explain this so I never have to think twice again?

There are three flags that indicate error state:
badbit means something has gone very wrong with the stream. It might be a buffer error or an error in whatever is feeding data to the stream. If this flag is set, it's likely that you aren't going to be using the stream anymore.
failbit means that an extraction or a read from the stream failed (or a write or insertion for output streams) and you need to be aware of that failure.
eofbit means the input stream has reached its end and there is nothing left to read. Note that this is set only after you attempt to read from an input stream that has reached its end (that is, it is set when an error occurs because you try to read data that isn't there).
The failbit may also be set by many operations that reach EOF. For example, if there is only whitespace left remaining in the stream and you try to read an int, you will both reach EOF and you will fail to read the int, so both flags will be set.
The fail() function tests badbit || failbit.
The good() function tests !(badbit || failbit || eofbit). That is, a stream is good when none of the bits are set.
You can reset the flags by using the ios::clear() member function; this allows you to set any of the error flags; by default (with no argument), it clears all three flags.
Streams do not overload operator bool(); operator void*() is used to implement a somewhat broken version of the safe bool idiom. This operator overload returns null if badbit or failbit is set, and non-null otherwise. You can use this to support the idiom of testing the success of an extraction as the condition of a loop or other control flow statement:
if (std::cin >> x) {
// extraction succeeded
}
else {
// extraction failed
}
The operator!() overload is the opposite of the operator void*(); it returns true if the badbit or failbit is set and false otherwise. The operator!() overload is not really needed anymore; it dates back to before operator overloads were supported completely and consistently (see sbi's question "Why does std::basic_ios overload the unary logical negation operator?").
C++0x fixes the problem that causes us to have to use the safe bool idiom, so in C++0x the basic_ios base class template does overload operator bool() as an explicit conversion operator; this operator has the same semantics as the current operator void*().

In addition to James' answer, it's important to remember that these flags indicate results of operations, so won't be set unless you perform one.
A common error is to do this:
#include <fstream>
#include <iostream>
#include <string>
int main()
{
std::ifstream file("main.cpp");
while (!file.eof()) // while the file isn't at eof...
{
std::string line;
std::getline(file, line); // ...read a line...
std::cout << "> " << line << std::endl; // and print it
}
}
The problem here is that eof() won't be set until after we try to get the last line, at which point the stream will say "nope, no more!" and set it. This means the "correct" way is:
#include <fstream>
#include <iostream>
#include <string>
int main()
{
std::ifstream file("main.cpp");
for (;;)
{
std::string line;
std::getline(file, line); // read a line...
if (file.eof()) // ...and check if it we were at eof
break;
std::cout << "> " << line << std::endl;
}
}
This places the check in the correct location. This is very unruly though; luckily for us, the return value for std::getline is the stream, and the stream has a conversion operator that allows it to be tested in a boolean context, with the value of fail(), which includes eof(). So we can just write:
#include <fstream>
#include <iostream>
#include <string>
int main()
{
std::ifstream file("main.cpp");
std::string line;
while (std::getline(file, line)) // get line, test if it was eof
std::cout << "> " << line << std::endl;
}

Related

Why does writing empty std::istringstream.rdbuf() set failbit?

I have learned, that I can copy a C++ std::istream to an C++ std::ostream by outputting the istreams' rdbuf(). I used it several times and it worked fine.
Today I got in trouble, because this operation sets badbit, if the std::istream is empty (at least for std::istringstream). I have written the following code to demonstrate my problem:
#include <stdio.h>
#include <sstream>
int main(int argc, char *argv[])
{
std::ostringstream ss;
ss << std::istringstream(" ").rdbuf(); // this does not set failbit
printf("fail=%d\n", ss.fail());
ss << std::istringstream("").rdbuf(); // why does this set failbit ???
printf("fail=%d\n", ss.fail());
}
I tried Windows/VS2017 and Linux/gcc-9.20 and both behave the same.
I am using std::istream& in a method signature with a default value of std::istringstream(""). The calling code shall be able to pass an optional istream, which is appended to some other data.
Can anybody explain, why badbit is set?
Is there better way to implement this optional std::istream& parameter?
I know, I could write two methods, one with an additional std::istream& parameter, but I want to avoid duplicate code.
Thanks in advance,
Mario
Update 22-Apr-20
I now use the following code:
#include <stdio.h>
#include <sstream>
int main(int argc, char *argv[])
{
std::ostringstream out;
std::istringstream in("");
while (in)
{
char Buffer[4096];
in.read(Buffer, sizeof(Buffer));
out.write(Buffer, in.gcount());
}
printf("fail=%d\n", out.fail());
}
I also added a warning about setting failbit when copying empty files to https://stackoverflow.com/a/10195497/6832488
The documentation of ostream::operator<< describes the following behavior for reading streams:
Behaves as an UnformattedOutputFunction. After constructing and checking the sentry object, checks if sb is a null pointer. If it is, executes setstate(badbit) and exits. Otherwise, extracts characters from the input sequence controlled by sb and inserts them into *this until one of the following conditions are met:
* end-of-file occurs on the input sequence;
* inserting in the output sequence fails (in which case the character to be inserted is not extracted);
* an exception occurs (in which case the exception is caught).
If no characters were inserted, executes setstate(failbit). If an exception was thrown while extracting, sets failbit and, if failbit is set in exceptions(), rethrows the exception.
As you can tell, that explicitly says that trying to insert an empty buffer will set the failbit. If you want to essentially make it "optional", just check that the stream is good before inserting the buffer and do ss.clear() afterwards to clear the failbit.

testing an istream object

When I use an std::istream object (in the example below from cplusplus.com, an std::ifstream) in a test : "if (myistreamobject)", the object, which is automatically allocated in the stack is never null, right ?... in the example below, we are using the same test to check if all the bytes were read from the file... and that's really a strange code, I usually use that style when I'm dealing with pointers...
I want to know which mechanism is used in std::istream to return a value in tests, and what that value really means... (the success/failure of the last operation ??) is it an overloading of a bool cast (like the const char* operator cast in the MFC class CString) or it is another technique ?
Because the object is never null, so putting it in a test will always return true.
// read a file into memory
#include <iostream> // std::cout
#include <fstream> // std::ifstream
int main () {
std::ifstream is ("test.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
std::cout << "Reading " << length << " characters... ";
// read data as a block:
is.read (buffer,length);
if (is) // <== this is really odd
std::cout << "all characters read successfully.";
else
std::cout << "error: only " << is.gcount() << " could be read";
is.close();
// ...buffer contains the entire file...
delete[] buffer;
}
return 0;
}
if (expression)
tests it expression evaluates to true which is a boolean. It works for pointers because nullptr/NULL/0 evaluate to false, and everything else true. It works for integral values for the same reason.
For an object, it falls to operator bool(), see http://en.cppreference.com/w/cpp/io/basic_ios/operator_bool.
Checks whether the stream has no errors.
1) Returns a null pointer if fail() returns true, otherwise returns a non-null pointer. This pointer is implicitly convertible to bool and may be used in boolean contexts.
2) Returns true if the stream has no errors and is ready for I/O operations. Specifically, returns !fail().
This operator makes it possible to use streams and functions that return references to streams as loop conditions, resulting in the idiomatic C++ input loops such as while(stream >> value) {...} or while(getline(stream, string)){...}.
Such loops execute the loop's body only if the input operation succeeded.
The
operator bool()
returns true if the stream has no errors, false otherwise.
The "no error" concept is something related with the previous operation done on the stream itself.
For example: after you invoke the constructor
std::ifstream is ("test.txt", std::ifstream::binary);
A internal status flag in the stream object is set. So when you invoke the operator bool you check whether the construction operation fails or not.
Moreover the method
is.read(...)
also set this internal status flag, as you can see in the reference:
Errors are signaled by modifying the internal state flags: eofbit, failbit, badbit.
So after the method call, if the stream reaches the EOF (end-of-file) the state bit is set, and the operator bool will return a positive value.
That means in that case when you test the stream with
if (is) { ... }
and the status bit is set, then the condition will be verified and the if-branch will be taken.
std::istream has operator declared right this:
explicit operator bool() const;
When you write
if(SomeStdIstremObject) { ... } really is calling if(SomeStdIstreamObject.operator bool()) not checking for non zero

Behavior of fstream as a bool and fstream.good() function

I recently used fstream for a homework assignment and I was wondering about how two things worked.
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char** argv) {
ifstream myFile;
myFile.open("fileone.txt");
int myInt = 0;
while (myFile.good()) { // What is the difference between myFile and myFile.good()?
if (!myFile.eof()){
myFile >> myInt;
cout << myInt << endl;
}
}
return 0;
}
This is a snippet of my actual code I am working on. In another post, someone said that if I used while(myFile) , it would automatically convert into a bool. What is the difference between using this and using the member function .good() of the ifstream class? I know that .good() breaks out of the while loop when I reach the end of the text file but how does using the stream name behave?
IOStream classes have 4 functions for assessing the stream state: good(), bad(), fail(), and eof(). Excluding good(), each function checks a single bit in the underlying stream state and returns whether or not the bit is on (are there errors?). good() in particular checks if all the bits are off (is the stream valid?). These are what they are for:
good(): The stream has not encountered an error.
bad(): The stream has encountered an error that effects the integrity of the stream (i.e memory allocation failure, no buffer, etc.)
fail(): Typically a recoverable error (formatting/parsing failure).
eof(): The end-of-file (EOF) character has been reached.
When performing I/O, it is integral that you check for errors in the stream while processing input. What novices typically don't know is that the only function that was meant to be used to check for valid input is fail(). All the other functions are useful in other cases but not for conditioning input.
Futhermore, novices also fail to realize that input must be performed before checking for errors. Doing otherwise allows an unchecked extraction, allowing the body of the loop to access the value that was not produced from a valid extraction.
Streams have a boolean operator that returns !fail(), this allows you to check the stream in an elegant way after performing input, like this:
while (myFile >> myInt) {
// ...
}
This is the best way to perform input. The extraction itself should be present within a conditional context so that the body of whatever its being used in is executed only if the extraction succeeded.
Read the manual.
The bool conversion is defined so that the following are the same:
if (stream) { ... }
if (!stream.fail()) { ... }
There is a difference between stream.good() and !stream.fail(): !fail is also true the end of file.
And one more big issue with your code: you should check if the read is successful before using the input. So this is really bad:
myFile >> myInt;
cout << myInt << endl
because you have not checked if you really succeeded to read an int into myInt.
TLDR:
Use this for reading ints from a file:
while (myFile >> myInt) {
cout << myInt << endl;
}
Reason: myFile >> myInt returns myFile so it will invoke the bool conversion which should be used as the loop condition.

istream (ostream) vs. bool

Here is a C++ code which reads as many words
from a given text file as possible until it meets EOF.
string text;
fstream inputStream;
inputStream.open("filename.txt");
while (inputStream >> text)
cout << text << endl;
inputStream.close();
My question is:
what procedure exactly is performed behind on converting the condition of the while loop (i.e., inputStream >> text) into a boolean values (i.e., true or false)?
My own answer for the question is:
To my understanding, inputStream >> text is supposed to return another (file) input stream. The stream seems to be NULL when EOF arrives. The NULL may be defined as 0, which is equivalent to false.
Does my answer make sense? Even if my answer does make sense, such conversion of InputStream to bool doesn't make me so comfortable. :)
what procedure exactly is performed behind on converting the condition of the while loop (i.e., inputStream >> text) into a boolean values (i.e., true or false)?
operator>> returns a reference to the stream.
In C++11 the reference is then converted to a bool by the stream's operator bool() function, which returns the equivalent of !fail().
In C++98 the same is achieved by using operator void*(), and the returned pointer is either NULL to indicate failure or a non-null pointer if fail() is false, which is then implicitly converted to a bool in the while evaluation.
I know that my answer has been perfectly answered by user657267. But I am adding one more example to understand the answer more easily.
// evaluating a stream
#include <iostream> // std::cerr
#include <fstream> // std::ifstream
int main () {
std::ifstream is;
is.open ("test.txt");
if (is) { <===== Here, an example of converting ifstream into bool
// read file
}
else {
std::cerr << "Error opening 'test.txt'\n";
}
return 0;
}
Ref. http://www.cplusplus.com/reference/ios/ios/operator_bool/

When does `ifstream::readsome` set `eofbit`?

This code loops forever:
#include <iostream>
#include <fstream>
#include <sstream>
int main(int argc, char *argv[])
{
std::ifstream f(argv[1]);
std::ostringstream ostr;
while(f && !f.eof())
{
char b[5000];
std::size_t read = f.readsome(b, sizeof b);
std::cerr << "Read: " << read << " bytes" << std::endl;
ostr.write(b, read);
}
}
It's because readsome is never setting eofbit.
cplusplus.com says:
Errors are signaled by modifying the internal state flags:
eofbit The get pointer is at the end of the stream buffer's internal input
array when the function is called, meaning that there are no positions to be
read in the internal buffer (which may or not be the end of the input
sequence). This happens when rdbuf()->in_avail() would return -1 before the
first character is extracted.
failbit The stream was at the end of the source of characters before the
function was called.
badbit An error other than the above happened.
Almost the same, the standard says:
[C++11: 27.7.2.3]: streamsize readsome(char_type* s, streamsize n);
32. Effects: Behaves as an unformatted input function (as described in
27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls
setstate(failbit) which may throw an exception, and return. Otherwise extracts
characters and stores them into successive locations of an array whose first
element is designated by s. If rdbuf()->in_avail() == -1, calls
setstate(eofbit) (which may throw ios_base::failure (27.5.5.4)), and extracts
no characters;
If rdbuf()->in_avail() == 0, extracts no characters
If rdbuf()->in_avail() > 0, extracts min(rdbuf()->in_avail(),n)).
33. Returns: The number of characters extracted.
That the in_avail() == 0 condition is a no-op implies that ifstream::readsome itself is a no-op if the stream buffer is empty, but the in_avail() == -1 condition implies that it will set eofbit when some other operation has led to in_avail() == -1.
This seems like an inconsistency, even despite the "some" nature of readsome.
So what are the semantics of readsome and eof? Have I interpreted them correctly? Are they an example of poor design in the streams library?
(Stolen from the [IMO] invalid libstdc++ bug 52169.)
I think this is a customization point, not really used by the default stream implementations.
in_avail() returns the number of chars it can see in the internal buffer, if any. Otherwise it calls showmanyc() to try to detect if chars are known to be available elsewhere, so a buffer fill request is guaranteed to succeed.
In turn, showmanyc() will return the number of chars it knows about, if any, or -1 if it knows that a read will fail, or 0 if it doesn't have a clue.
The default implementation (basic_streambuf) always returns 0, so that is what you get unless you have a stream with some other streambuf overriding showmanyc.
Your loop is essentially read-as-many-chars-as-you-know-is-safe, and it gets stuck when that is zero (meaning "not sure").
I don't think that readsome() is meant for what you're trying to do (read from a file on disk)... from cplusplus.com:
The function is intended to be used to read binary data from certain
types of asynchronic sources that may wait for more characters, since
it stops reading when the local buffer exhausts, avoiding potential
unexpected delays.
So it sounds like readsome() is intended for streams from a network socket or something like that, and you probably want to just use read().
If no character is available (i.e. gptr() == egptr() for the std:streambuf) the virtual member function showhowmanyc() is called. I could have an implementation of showmanyc() which returns an error code. Why that may be useful is a different question. However, this could set eof(). Of course, in_avail() is meant not to fail and not to block and just return the characters known to be available. That is, the loop you have above is essentially guaranteed to be an infinite loop unless you have a rather odd stream buffer.
Others have answered why readsome won't set eofbit by design. I will suggest a way to read some bytes until eof without setting fail bit in a intuitive way, in the same way you were trying to use readsome. This is the result of research in another question.
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;
streamsize Read(istream &stream, char *buffer, streamsize count)
{
// This consistently fails on gcc (linux) 4.8.1 with failbit set on read
// failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
streamsize reads = stream.rdbuf()->sgetn(buffer, count);
// This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
// failure of the previous sgetn()
stream.rdstate();
// On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
// sets eofbit when stream is EOF for the conseguences of sgetn(). It
// should also throw if exceptions are set, or return on the contrary,
// and previous rdstate() restored a failbit on Windows. On Windows most
// of the times it sets eofbit even on real read failure
stream.peek();
return reads;
}
int main(int argc, char *argv[])
{
ifstream instream("filepath", ios_base::in | ios_base::binary);
while (!instream.eof())
{
char buffer[0x4000];
size_t read = Read(instream, buffer, sizeof(buffer));
// Do something with buffer
}
}