istream eof discrepancy between libc++ and libstdc++

istream eof discrepancy between libc++ and libstdc++ - c++

The following (toy) program returns different things when linked against libstdc++ and libc++. Is this a bug in libc++ or do I not understand how istream eof() works? I have tried running it using g++ on linux and mac os x and clang on mac os x, with and without -std=c++0x. It was my impression that eof() does not return true until an attempt to read (by get() or something else) actually fails. This is how libstdc++ behaves, but not how libc++ behaves.
#include <iostream>
#include <sstream>
int main() {
std::stringstream s;
s << "a";
std::cout << "EOF? " << (s.eof() ? "T" : "F") << std::endl;
std::cout << "get: " << s.get() << std::endl;
std::cout << "EOF? " << (s.eof() ? "T" : "F") << std::endl;
return 0;
}
Thor:~$ g++ test.cpp
Thor:~$ ./a.out
EOF? F
get: 97
EOF? F
Thor:~$ clang++ -std=c++0x -stdlib=libstdc++ test.cpp
Thor:~$ ./a.out
EOF? F
get: 97
EOF? F
Thor:~$ clang++ -std=c++0x -stdlib=libc++ test.cpp
Thor:~$ ./a.out
EOF? F
get: 97
EOF? T
Thor:~$ clang++ -stdlib=libc++ test.cpp
Thor:~$ ./a.out
EOF? F
get: 97
EOF? T

EDIT: This was due to the way older versions of libc++ interpreted the C++ standard. The interpretation was discussed in LWG issue 2036, it was ruled to be incorrect and libc++ was changed.
Current libc++ gives the same results on your test as libstdc++.
old answer:
Your understanding is correct.
istream::get() does the following:
Calls good(), and sets failbit if it returns false (this adds a failbit to a stream that had some other bit set), (§27.7.2.1.2[istream::sentry]/2)
Flushes whatever's tie()'d if necessary
If good() is false at this point, returns eof and does nothing else.
Extracts a character as if by calling rdbuf()->sbumpc() or rdbuf()->sgetc() (§27.7.2.1[istream]/2)
If sbumpc() or sgetc() returned eof, sets eofbit. (§27.7.2.1[istream]/3) and failbit (§27.7.2.2.3[istream.unformatted]/4)
If an exception was thrown, sets badbit (§27.7.2.2.3[istream.unformatted]/1) and rethrows if allowed.
Updates gcount and returns the character (or eof if it couldn't be obtained).
(chapters quoted from C++11, but C++03 has all the same rules, under §27.6.*)
Now let's take a look at the implementations:
libc++ (current svn version) defines the relevant part of get() as
sentry __s(*this, true);
if (__s)
{
__r = this->rdbuf()->sbumpc();
if (traits_type::eq_int_type(__r, traits_type::eof()))
this->setstate(ios_base::failbit | ios_base::eofbit);
else
__gc_ = 1;
}
libstdc++ (as shipped with gcc 4.6.2) defines the same part as
sentry __cerb(*this, true);
if (__cerb)
{
__try
{
__c = this->rdbuf()->sbumpc();
// 27.6.1.1 paragraph 3
if (!traits_type::eq_int_type(__c, __eof))
_M_gcount = 1;
else
__err |= ios_base::eofbit;
}
[...]
if (!_M_gcount)
__err |= ios_base::failbit;
As you can see, both libraries call sbumpc() and set eofbit if and only if sbumpc() returned eof.
Your testcase produces the same output for me using recent versions of both libraries.

This was a libc++ bug and has been fixed as Cubbi noted. My bad. Details are here:
http://lwg.github.io/issues/lwg-closed.html#2036

The value of s.eof() is unspecified in the second call—it may be
true or false, and it might not even be consistent. All you can say is
that if s.eof() returns true, all future input will fail (but if it
returns false, there's no guarantee that future input will succeed).
After failure (s.fail()), if s.eof() returns true, it's likely (but
not 100% certain) that the failure was due to end of file. It's worth
considering the following scenario, however:
double test;
std::istringstream s1("");
s1 >> test;
std::cout << (s1.fail() ? "T" : "F") << (s1.eof() ? "T" : "F") << endl;
std::istringstream s2("1.e-");
s2 >> test;
std::cout << (s2.fail() ? "T" : "F") << (s2.eof() ? "T" : "F") << endl;
On my machine, both lines are "TT", despite the fact that the first
failed because there was no data (end of file), the second because the
floating point value was incorrectly formatted.

eofbit is set when there is an operation which tries to read past the end of file, the operation may not fail (if you are reading an integer and there is no end of line after the integer, I expect eofbit to be set but the read of the integer to succeed). I.E. I get and expect FT for
#include <iostream>
#include <sstream>
int main() {
std::stringstream s("12");
int i;
s >> i;
std::cout << (s.fail() ? "T" : "F") << (s.eof() ? "T" : "F") << std::endl;
return 0;
}
Here I don't expect istream::get to try and read after the returned character (i.e. I don't expect it to hang until I enter the next line if I read a \n with it), so libstd++ seems indeed right, at least in a QOI POV.
The standard description for istream::get just says "extracts a character c, if one is available" without describing how and so doesn't seem to prevent libc++ behavior.

Related

errno doesn't change after putting a negative value in sqrt()

With errno, I am trying to check if cmath functions produce a valid result. But even after I put a negative value in sqrt() or log(), errno stays at value 0.
Can anyone know why, and how I can make errno behave correctly?
The environment is macOS Monterey version 12.6.1, the compiler is gcc version 11.3.0 (Homebrew GCC 11.3.0_1) or Apple clang version 14.0.0 (clang-1400.0.29.202) (I tried the 2 compilers).
The compile command is g++ test_errno.cpp -o test_errno -std=c++14.
The piece of code I tried is directly copied from this page. The following is the code.
#include <iostream>
#include <cmath>
#include <cerrno>
#include <cstring>
#include <clocale>
int main()
{
double not_a_number = std::log(-1.0);
std::cout << not_a_number << '\n';
if (errno == EDOM) {
std::cout << "log(-1) failed: " << std::strerror(errno) << '\n';
std::setlocale(LC_MESSAGES, "de_DE.utf8");
std::cout << "Or, in German, " << std::strerror(errno) << '\n';
}
}
Its result didn't print the error messages, which should be printed if errno is set correctly.

It seems that on macOS, errno is not used, from this bug report from #GAVD's comment.
I could check that via math_errhandling value from #Pete Becker's comment.
It seems there are 2 ways of math error handling in C/C++, either with errno, or with floating-point exception, as the 2nd link above shows.
We can check which way (or both of them) the system's math library employs, via checking if macro constant math_errhandling is equal to MATH_ERREXCEPT or MATH_ERRNO, like the following (copied from the 2nd link):
std::cout << "MATH_ERRNO is "
<< (math_errhandling & MATH_ERRNO ? "set" : "not set") << '\n'
<< "MATH_ERREXCEPT is "
<< (math_errhandling & MATH_ERREXCEPT ? "set" : "not set") << '\n';
And on my system, the output is
MATH_ERRNO is not set
MATH_ERREXCEPT is set
, which means the system does not use errno for reporting math errors, but use floating-point exception.
That's why errno stays at value 0 no matter what, and I should have used std::fetestexcept() to check error conditions.
With floating-point exceptions, std::feclearexcept(FE_ALL_EXCEPT); corresponds to errno = 0;, and std::fetestexcept(FE_DIVBYZERO) corresponds to errno == ERANGE for example.

I'm going to take a stab in the dark and guess you are enabling fast-math in your build?
Without fast-math:
https://godbolt.org/z/vMo1P7Mn1
With fast-math:
https://godbolt.org/z/jEsGz7n38
The error handling within cmath tends to break things like vectorisation & constexpr (setting external global variables, is a side effect that breaks both cases). As a result, you are usually better off checking for domain errors yourself...

How would one generalise `clearerr()` under C++?…

TL;DR
I am aware that if a program listens for EOF (e.g. ^D) as a sign to stop taking input, e.g. by relying on a conditional like while (std::cin) {...}, one needs to call cin.clear() before standard input can be read from again (readers who'd like to know more, see this table).
I recently learned that this is insufficient, and that the underlying C file descriptors, including stdin, need clearerr() to be run to forget EOF states.
Since clearerr() needs a C-style file descriptor, and C++ operates mainly with std::basic_streambufs and the like (e.g. cin), I want to generalise some code (see below) to run clearerr() on any streambuf's associated C-style file-descriptor, even if that may not be stdin.
EDITS (1&2):
I wonder if stdin is the only ever file-descriptor that behaves like this (needing clearerr() to run) ...?
If it isn't, then the following code should end the question of generalisation (idea pointed out by zkoza in their answer)
As zkoza pointed out in their comment below, stdin is the only file-descriptor that would, logically, ever need such treatment (i.e. clearerr()). Checking whether a given C++ stream is actually really attached to *std::cin.rdbuf() is all that is needed:
std::istream theStream /* some stream with some underlying streambuf */
if (theStream.rdbuf() == std::cin.rdbuf())
clearerr(stdin);
Background
I'm writing a tool in C++ where I need to get multiple lines of user input, twice.
I know there are multiple ways of getting multiline input (e.g. waiting for double-newlines), but I want to use EOF as the user's signal that they're done — not unlike when you gpg -s or -e.
After much consultation (here, here, and on cppreference.com), I decided to use... (and I quote the third):
[the] idiomatic C++ input loops such as [...]
while(std::getline(stream, string)){...}
Since these rely on std::basic_ios::operator bool to do their job, I ensured that cin.rdstate() was cleared between the first and second user-input instructions (using cin.clear()).
The gist of my code is as follows:
std::istream& getlines (std::basic_istream<char> &theStream,
std::vector<std::string> &stack) {
std::ios::iostate current_mask (theStream.exceptions());
theStream.exceptions(std::ios::badbit);
std::string &_temp (*new std::string);
while (theStream) {
if (std::getline(theStream, _temp))
stack.push_back(_temp); // I'd really like the input broken...
// ... into a stack of `\n`-terminated...
// ... strings each time
}
// If `eofbit` is set, clear it
// ... since std::basic_istream::operator bool needs `goodbit`
if (theStream.eof())
theStream.clear(theStream.rdstate()
& (std::ios::failbit | std::ios::badbit));
// Here the logical AND with
// ... (failbit OR badbit) unsets eofbit
// std::getline sets failbit if nothing was extracted
if (theStream.fail() && !stack.size()) {
throw std::ios::failure("No input recieved!");
}
else if (theStream.fail() && stack.size()) {
theStream.clear(theStream.rdstate() & std::ios::badbit);
clearerr(stdin); // 👈 the part which I want to generalise
}
delete &_temp;
theStream.exceptions(current_mask);
return theStream;
}

This does what you need:
#include <iostream>
int main()
{
std::cin.sync_with_stdio(true);
char c = '1', d = '1';
std::cout << "Enter a char: \n";
std::cin >> c;
std::cout << (int)c << "\n";
std::cout << std::cin.eof() << "\n";
std::cin.clear();
clearerr(stdin);
std::cout << std::cin.eof() << "\n";
std::cout << "Enter another char: \n";
std::cin >> d;
std::cout << (int)d << "\n";
std::cout << std::cin.eof() << "\n";
}
It works because C++'s std::cin is tied, by default, with C's stdin (so, the first line is actually not needed). You have to modify your code to check if the stream is std::cin and if so, perform clearerr(stdin);
EDIT:
Actually, sync_with_stdio ensures only synchronization between the C and C++ interfaces, but internally they work on the same file descriptors and this may be why clearerr(stdin); works whether or not the interfaces are tied by sync_with_stdio
EDIT2: Does these answer your problem? Getting a FILE* from a std::fstream
https://www.ginac.de/~kreckel/fileno/ ?

Reading from std::cin multiple times -- different behavior on Linux and Mac OS X

I want to read some numbers from the standard input, process them, then read the next bunch of numbers.
I came up with the solution to read the EOF character in a char and clear the eofbit, failbit, and badbit. The following code works on Ubuntu 14.04 with GCC 4.9.2:
#include <iostream>
#include <vector>
int main() {
std::vector<double> a;
std::vector<double>::iterator it;
double x;
while(std::cin >> x) {
a.push_back(x);
}
std::cout << "First bunch of numbers:" << std::endl;
for (it = a.begin(); it != a.end(); ++it) {
std::cout << *it << std::endl;
}
// get crap out of buffer
char s;
std::cin >> s;
std::cin.clear();
// go for it again
while (std::cin >> x) {
a.push_back(x);
}
std::cout << "All the numbers:" << std::endl;
for (it = a.begin(); it != a.end(); ++it) {
std::cout << *it << std::endl;
}
return 0;
}
So, on Ubuntu I can type 1<Return>2<Return>^D, get some output, type 3<Return>4<Return>^D, get more output and the program terminates.
On Mac OS 10.10 however, using the same GCC version, the program will not accept the second round of input but outputs the first sequence of numbers twice after hitting ^D the first time.
Why is there inconsistent behavior? Is it possible to work around it?
What would be the idiomatic way to accept input twice?
In my use case, the first bunch of number may eventually be read from a file or pipeline. How can I ask for additional input interactively also in that scenario.

im not all that familiar but, this guy has a similar question: Signal EOF in mac osx terminal
By default, OS X (formerly Mac OS X) terminals recognize EOF when control-D is pressed at the beginning of a line.
In detail, the actual operation is that, when control-D is pressed,
all bytes in the terminal’s input buffer are sent to the running
process using the terminal. At the start of a line, no bytes are in
the buffer, so the process is told there are zero bytes available, and
this acts as an EOF indicator.
This procedure doubles as a method of delivering input to the process
before the end of a line: The user may type some characters and press
control-D, and the characters will be sent to the process immediately,
without the usual wait for enter/return to be pressed. After this
“send all buffered bytes immediately” operation is performed, no bytes
are left in the buffer. So, when control-D is pressed a second time,
it is the same as the beginning of a line (no bytes are sent, and the
process is given zero bytes), and it acts like an EOF.
You can learn more about terminal behavior by using the command “man 4
tty” in Terminal. The default line discipline is termios. You can
learn more about the termios line discipline by using the command “man
termios”.
Accepted Answer from Eric Postpischil
I dont have OSX to test against, but maybe this explains the behaviour

How is (cin) evaluated?

In Bjarne Stroustrup's Programming Principles and Practice Using C++ (Sixth Printing, November 2012), if (cin) and if (!cin) are introduced on p.148 and used in earnest on p.178. while (cin) is introduced on p.183 and used in earnest on p.201.
However, I feel I don't fully understand how these constructs work, so I'm exploring them.
If I compile and run this:
int main()
{
int i = 0 ;
while (cin) {
cout << "> ";
cin >> i ;
cout << i << '\n';
}
}
I get something like:
$ ./spike_001
> 42
42
> foo
0
$
Why is it that entering "foo" apparently causes i to be set to 0?
Why is it that entering "foo" causes cin to be set to false?
Alternatively, if I run and compile this:
int main()
{
int i = 0 ;
while (true) {
cout << "> ";
cin >> i ;
cout << i << '\n';
}
}
I get something like:
$ ./spike_001
> 42
42
> foo
> 0
> 0
...
The last part of user input here is foo. After that is entered, the line > 0 is printed to stdout repeatedly by the program, until it is stopped with Ctrl+C.
Again, why is it that entering "foo" apparently causes i to be set to 0?
Why is it that the user is not prompted for a new value for i on the next iteration of the while loop after foo was entered?
This is using g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3.
I'm sorry for such a long question, but I think it all boils down to, "How is cin evaluated?"

Well, std::cin (or more precisely std::basic_ios) has an operator bool to implicitly convert the std::cin object to a bool when requested (for example in your if evaluation).
The semantic is as follows:
Checks whether the stream has no errors. Returns true if the stream has no errors and is ready for I/O operations. Specifically, returns !fail().
Why is it that entering "foo" apparently causes i to be set to 0?
Again, why is it that entering "foo" apparently causes i to be set to 0?
Because operator>> on an int expects an integer in the string.
From cppreference:
If extraction fails (e.g. if a letter was entered where a digit is expected), value is left unmodified and failbit is set.
Why is it that entering "foo" causes cin to be set to false?
Because the fail bit is set, therefore leading fail() to return true.
Why is it that the user is not prompted for a new value for i on the next iteration of the while loop after foo was entered?
That is because std::cin has the fail bit set, therefore it fails to get the input and just prints the old value of i which is 0.

When you use the >> operator on a stream, it attempts to extract a value of that type from the string. In other words, when you do cin >> i where i is an int, the stream attempts to pull an int from the stream. If this succeeds, i is set to the extracted value, and all is well. If it fails, the stream's badbit is set. This is important because treating a stream as a bool is equivalent to checking if it's badbit is set (if (cin) is like if (cin.good())).
So anyway... what's happening is that foo is setting the badbit since the extraction fails. Really, you should be checking if the extraction succeeds directly:
if (cin >> i) { /* success */ }
On a code quality note, I suspect you're using using namespace std;. Please be aware that this can be harmful.

Why does my program produce different results on Windows and Linux, about file reading with ifstream?

I have a program shown as follows. For it I have several questions:
1). Why does it produce different results on different platforms? I'll paste the screen-shots later.
2). I'm using a fail() method to check if the "file.read()" failed. Is this correct? I use fail() method because this web page says this:
The function returns true if either the failbit or the badbit is set. At least one of these flags is set when some error other than reaching the End-Of-File occurs during an input operation.
But later I read this page about istream::read() here. It says the eofbit and failbit would always be set at the same time.. Does this mean that a normal EOF situation would also result in that fail() returns true? This seems to conflict with "other than reaching the End-Of-File occurs"..
Could anyone help me clarify how I am supposed to use these methods? Should I use bad() instead?
My program
#include <iostream>
#include <fstream>
using namespace std;
#ifdef WIN32
char * path="C:\\Workspace\\test_file.txt";
#else
char * path="/home/robin/Desktop/temp/test_file.txt";
#endif
int main(int argc, char * argv[])
{
ifstream file;
file.open(path);
if (file.fail())
{
cout << "File open failed!" << endl;
return -1; // If the file open fails, quit!
}
// Calculate the total length of the file so I can allocate a buffer
file.seekg(0, std::ios::end);
size_t fileLen = file.tellg();
cout << "File length: " << fileLen << endl;
file.seekg(0, std::ios::beg);
// Now allocate the buffer
char * fileBuf = new (std::nothrow) char[fileLen+1];
if (NULL == fileBuf)
return -1;
::memset((void *)fileBuf, 0, fileLen+1); // Zero the buffer
// Read the file into the buffer
file.read(fileBuf, fileLen);
cout << "eof: " << file.eof() << endl
<< "fail: " << file.fail() << endl
<< "bad: " << file.bad() << endl;
if (file.fail())
{
cout << "File read failed!" << endl;
delete [] fileBuf;
return -1;
}
// Close the file
file.close();
// Release the buffer
delete [] fileBuf;
return 0;
}
The test_file.txt content(shown with "vim -b". It's a very simple file):
Result on Windows(Visual Studio 2008 SP1):
Result on Linux(gcc 4.1.2):

Does this mean that a normal EOF situation would also result in that fail() returns true? This seems to conflict with "other than reaching the End-Of-File occurs".
I recommend using a reference that isn't full of mistakes.
http://en.cppreference.com/w/cpp/io/basic_ios/fail says:
Returns true if an error has occurred on the associated stream. Specifically, returns true if badbit or failbit is set in rdstate().
And the C++ standard says:
Returns: true if failbit or badbit is set in rdstate().
There's no "other than end-of-file" thing. An operation that tries to read past the end of the file, will cause failbit to set as well. The eofbit only serves to distinguish that specific failure reason from others (and that is not as useful as one might think at first).
I'm using a fail() method to check if the "file.read()" failed. Is this correct?
You should simply test with conversion to bool.
if(file) { // file is not in an error state
It's synonymous with !fail(), but it's more usable, because you can use it to test directly the result of a read operation without extra parenthesis (things like !(stream >> x).fail() get awkward):
if(file.read(fileBuf, fileLen)) { // read succeeded
You will notice that all read operations on streams return the stream itself, which is what allows you to do this.
Why does it produce different results on different platforms?
The difference you're seeing between Windows and Linux is because the file is open in text mode: newline characters will be converted silently by the implementation. This means that the combination "\r\n" (used in Windows for newlines) will be converted to a single '\n' character in Windows, making the file have only 8 characters. Note how vim shows a ^M at the end of the first line: that's the '\r' part. In Linux a newline is just '\n'.
You should open the file in binary mode if you want to preserve the original as is:
file.open(path, std::ios_base::in | std::ios_base::binary);

I guess, the problem here with the different execution is the DOS(Window) vs. UNIX text file convention.
In DOS, a line ends with <CR><LF>, and this is read/written together as '\n'. Thus, in Windows your file is at the end, but in UNIX not, since there is one character left.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

istream eof discrepancy between libc++ and libstdc++ - c++

This was a libc++ bug and has been fixed as Cubbi noted. My bad. Details are here: http://lwg.github.io/issues/lwg-closed.html#2036

Related

errno doesn't change after putting a negative value in sqrt()

How would one generalise `clearerr()` under C++?…

Reading from std::cin multiple times -- different behavior on Linux and Mac OS X

How is (cin) evaluated?

Why does my program produce different results on Windows and Linux, about file reading with ifstream?

Categories

Resources