Reading with setw: to eof or not to eof? - c++

Consider the following simple example
#include <string>
#include <sstream>
#include <iomanip>
using namespace std;
int main() {
string str = "string";
istringstream is(str);
is >> setw(6) >> str;
return is.eof();
}
At the first sight, since the explicit width is specified by the setw manipulator, I'd expect the >> operator to finish reading the string after successfully extracting the requested number of characters from the input stream. I don't see any immediate reason for it to try to extract the seventh character, which means that I don't expect the stream to enter eof state.
When I run this example under MSVC++, it works as I expect it to: the stream remains in good state after reading. However, in GCC the behavior is different: the stream ends up in eof state.
The language standard, it gives the following list of completion conditions for this version of >> operator
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
Given the above, I don't see any reason for the >> operator to drive the stream into the eof state in the above code.
However, this is what the >> operator implementation in GCC library looks like
...
__int_type __c = __in.rdbuf()->sgetc();
while (__extracted < __n
&& !_Traits::eq_int_type(__c, __eof)
&& !__ct.is(__ctype_base::space,
_Traits::to_char_type(__c)))
{
if (__len == sizeof(__buf) / sizeof(_CharT))
{
__str.append(__buf, sizeof(__buf) / sizeof(_CharT));
__len = 0;
}
__buf[__len++] = _Traits::to_char_type(__c);
++__extracted;
__c = __in.rdbuf()->snextc();
}
__str.append(__buf, __len);
if (_Traits::eq_int_type(__c, __eof))
__err |= __ios_base::eofbit;
__in.width(0);
...
As you can see, at the end of each successful iteration, it attempts to prepare the next __c character for the next iteration, even though the next iteration might never occur. And after the cycle it analyzes the last value of that __c character and sets the eofbit accordingly.
So, my question is: triggering the eof stream state in the above situation, as GCC does - is it legal from the standard point of view? I don't see it explicitly specified in the document. Is both MSVC's and GCC's behavior compliant? Or is only one of them behaving correctly?

The definition for that particular operator>> is not relevant to the setting of the eofbit, as it only describes when the operation terminates, but not what triggers a particular bit.
The description for the eofbit in the standard (draft) says:
eofbit - indicates that an input operation reached the end of an input sequence;
I guess here it depends on how you want to interpret "reached". Note that gcc implementation correctly does not set failbit, which is defined as
failbit - indicates that an input operation failed to read the expected characters, or
that an output operation failed to generate the desired characters.
So I think eofbit does not necessarily mean that the end of file impeded the extractions of any new characters, just that the end of file has been "reached".
I can't seem to find a more accurate description for "reached", so I guess that would be implementation defined. If this logic is correct, then both MSVC and gcc behaviors are correct.
EDIT: In particular, it seems that eofbit gets set when sgetc() would return eof. This is described both in the istreambuf_iterator section and in the basic_istream::sentry section. So now the question is: when is the current position of the stream allowed to advance?
FINAL EDIT: It turns out that probably g++ has the correct behavior.
Every character scan passes through <locale>, in order to allow different character sets, money formats, time descriptions and number formats to be parsed. While there does not seem to be a through description on how the operator>> works for strings, there are very specific descriptions on how do_get functions for numbers, time and money are supposed to operate. You can find them from page 687 of the draft forward.
All of these start off by reading a ctype (the "global" version of a character, as read through locales) from a istreambuf_iterator (for numbers, you can find the call definitions at page 1018 of the draft). Then the ctype is processed, and finally the iterator is advanced.
So, in general, this requires the internal iterator to always point to the next character after the last one read; if that was not the case you could in theory extract more than you wanted:
string str = "strin1";
istringstream is(str);
is >> setw(6) >> str;
int x;
is >> x;
If the current character for is after the extraction for str was not on the eof, then the standard would require that x gets the value 1, since for numeric extraction the standard explicitly requires that the iterator is advanced after the first read.
Since this does not make much sense, and given that all complex extractions described in the standard behave in the same way, it makes sense that for strings the same would happen. Thus, as the pointer for is after reading 6 characters falls on the eof, the eofbit needs to be set.

Related

integers, chars and floating points in structs

So, I'm having some issues with my c++ code. I have the following code, but so far I can't get most of the data stored into the structured data type.
//structured data declaration
struct item
{
int itemCode;
char description[20];
float price;
};
And then the get code looks like this.
cout << setprecision(2) << fixed << showpoint;
ofstream salesFile ("Sales.txt");
ifstream stockFile ("Stock.txt");
for (counter = 0; counter < 9; counter++)
{
stockFile >> instock[counter].itemCode;
stockFile.getline (instock[counter].description, 20);
stockFile >> instock[counter].price;
}
The output should have looked like:
1234 "description here" 999.99
Quantity X
And this was the output:
1234 0.00
Quantity 5
If you have a file format that is of the form (for one entry)
1234
description here
999.99
(across multiple lines) then the explanation is simple
Th reading code in your loop, which does
stockFile >> instock[counter].itemCode;
stockFile.getline (instock[counter].description, 20);
stockFile >> instock[counter].price;
will work in this sequence
The value of instock[counter].itemCode will receive the value 1234. However (and this is important to understand) the newline after the 1234 will still be waiting in the stream to be read.
The call of getline() will encounter the newline, and return immediately. instock[counter].description will contain the string "".
The expression stockFile >> instock[counter].price will encounter the d in description. This cannot be interpreted as an integral value, so instock[counter].price will be unchanged.
Assuming some preceding code (which you haven't shown) sets instock[counter].price to 999.99 the above sequence of events will explain your output.
The real problem is that you are mixing styles of input on the one stream. In this case, mixing usage of streaming operators >> with use of line-oriented input (getline()). As per my description of the sequence above, different styles of input interact in different ways, because (as in this case) they behave differently when encountering a newline.
Some people will just tell you to skip over the newline after reading instock[counter].itemCode. That advice is flawed, since it doesn't cope well with changes (e.g. what happens if the file format changes to include an additional field on another line?, what happens if the file isn't "quite" in the expected format for some reason?).
The more general solution is to avoid mixing styles of input on the one stream. A common way would be to use getline() to read all data from the stream (i.e. not use >> to interact directly with stockFile). Then interpret/parse each string to find the information needed.
Incidentally, rather than using arrays of char to hold a string, try using the standard std::string (from standard header <string>). This has the advantage that std::string can adjust its length as needed. std::getline() also has an overload that can happily read to an std::string. Once data is read from your stream as an std::string, it can be interpreted as needed.
There are many ways of interpreting a string (e.g. to extract integral values from it). I'll leave finding an approach for that as an exercise - you will learn more by doing it yourself.

Why does std::ios_base::ignore() set the EOF bit?

When I read all data from a stream, but make no attempt to read past its end, the stream's EOF is not set. That's how C++ streams work, right? It's the reason this works:
#include <sstream>
#include <cassert>
char buf[255];
int main()
{
std::stringstream ss("abcdef");
ss.read(buf, 6);
assert(!ss.eof());
assert(ss.tellg() == 6);
}
However, if instead of read()ing data I ignore() it, EOF is set:
#include <sstream>
#include <cassert>
int main()
{
std::stringstream ss("abcdef");
ss.ignore(6);
assert(!ss.eof()); // <-- FAILS
assert(ss.tellg() == 6); // <-- FAILS
}
This is on GCC 4.8 and GCC trunk (Coliru).
It also has the unfortunate side-effect of making tellg() return -1 (because that's what tellg() does), which is annoying for what I'm doing.
Is this standard-mandated? If so, which passage and why? Why would ignore() attempt to read more than I told it to?
I can't find any reason for this behaviour on cppreference's ignore() page. I can probably .seekg(6, std::ios::cur) instead, right? But I'd still like to know what's going on.
I think this is a libstdc++ bug (42875, h/t NathanOliver). The requirements on ignore() in [istream.unformatted] are:
Characters are extracted until any
of the following occurs:
— n != numeric_limits<streamsize>::max() (18.3.2) and n characters have been extracted so far
— end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit),
which may throw ios_base::failure (27.5.5.4));
— traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character
c (in which case c is extracted).
Remarks: The last condition will never occur if traits::eq_int_type(delim, traits::eof()).
So we have two conditions (the last is ignored) - we either read n characters, or at some point we hit end-of-file in which case we set the eofbit. But, we are able to read n characters from the stream in this case (there are in fact 6 characters in your stream), so we will not hit end-of-file on the input sequence.
In libc++, eof() is not set and tellg() does return 6.

istream::getline failbit not getting set?

#include <iostream>
#include <sstream>
#include <fstream>
using namespace std;
int main(){
istringstream input("1234");
char c[5];
while(input.getline(c, 5, '\n')){
cout << "OUTPUT: " << c << endl;
}
}
The output is
OUTPUT: 1234
when I feel like all sources tell me input should test as false and there should be no ouput. From the standard (N3337) [27.7.2.3]/18:
Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, extracts characters and stores them into successive locations of an array whose first element is designated by s. Characters are extracted and stored until one of the following
occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) for the next available input character c (in which case the input character
is extracted but not stored);320
n is less than one or n - 1 characters are stored (in which case the function calls setstate(
failbit)).
Since 4 values get stored, failbit should be getting set. Some other sources give a bit differing but still confusing input on this function. Cplusplus:
The failbit flag is set if the function extracts no characters, or if the delimiting character is not found once (n-1) characters have already been written to s. Note that if the character that follows those (n-1) characters in the input sequence is precisely the delimiting character, it is also extracted and the failbit flag is not set (the extracted sequence was exactly n characters long).
Again, the deliminting character '\n' is not found after the 4, and so failbit should be getting set. Cppreference says a similar thing. What am I missing here?
Yes it reads n-1 characters and it never encountered '\n' but you missed the first point
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
Since you read in exactly what was in the stream the eofbit gets set and you get the input.
If we add
std::cout << input.eof();
You can see that is indeed what happened(live example)

Taking input from cin and storing it in a char variable

I'm taking input using cin and storing it into a char variable. My question is if there is any input that could cause cin.fail() to return true.
I know that trying to store input such as "foo" into an int variable will fail, but is there any case in which this is possible with a char variable?
The overloads for operator>> which take a char follow the normal behavior of a formatted input function, that is they call rdbuf()->sbumpc() or rdbuf()->sgetc() to perform the extraction. Naturally, if eof is encountered, then eofbit is set. If one of the functions throw an exception, then badbit is set. If either of these are set, then failbit is set. There's no evidence to indicate that the operation would fail otherwise. (This is covered under section [istream] in the C++11 draft standard.) For other types, like int, do_get() is used to convert the character (similar to scanf). Of course, the conversion can fail, but no conversion is needed if the input is already a char.
Now the comments are misleading. CTRL+C would kill the application in Linux. CTRL+Z would send a character that signals EOF on some operating systems.
You can even use an emoji and it would work:
#include <iostream>
int main()
{
char c;
if (std::cin >> c)
std::cout << "Huzzah!";
}
With input 😁 outputs "Huzzah!" as expected.
I guess not because a char variable with just take only the first character from the given input, no matter how long the input or of what type it is (int,long,double..)
No, failbit is only set if there's a logical error reading the input stream, AKA, someone rips out the USB flashdrive containing the file from which you're reading. ;)

Different EOF behavior with read versus ignore

I was recently just tripped up by a subtle distinction between the behavior of std::istream::read versus std::istream::ignore. Basically, read extracts N bytes from the input stream, and stores them in a buffer. The ignore function extracts N bytes from the input stream, but simply discards them rather than storing them in a buffer. So, my understanding was that read and ignore are basically the same in every way, except for the fact that read saves the extracted bytes whereas ignore just discards them.
But there is another subtle difference between read and ignore which managed to trip me up. If you read to the end of a stream, the EOF condition is not triggered. You have to read past the end of a stream in order for the EOF condition to be triggered. But with ignore it is different: you only need to read to the end of a stream.
Consider:
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
{
std::stringstream ss;
ss << "abcd";
char buf[1024];
ss.read(buf, 4);
std::cout << "EOF: " << std::boolalpha << ss.eof() << std::endl;
}
{
std::stringstream ss;
ss << "abcd";
ss.ignore(4);
std::cout << "EOF: " << std::boolalpha << ss.eof() << std::endl;
}
}
On GCC 4.4.5, this prints out:
EOF: false
EOF: true
So, why is the behavior different here? This subtle difference managed to confuse me enough to wonder why there is a difference. Is there some compelling reason that EOF is triggered "early" with a call to ignore?
eof() should only return true if you have already attempted to read past the end. In neither case should it be true. This may be a bug in your implementation.
I'm going to go out on a limb here and answer my own question: it really looks like this is a bug in GCC.
The standard reads in 27.6.1.3 paragraph 23:
[istream::ignore] behaves as an
unformatted input function (as
described in 27.6.1.3, paragraph 1).
After constructing a sentry object,
extracts characters and discards them.
Characters are extracted until any of
the following occurs:
if n != numeric_limits::max()
(18.2.1), n characters are extracted
end-of-file occurs on the input sequence (in which case the function
calls setstate(eofbit), which may
throw ios_base::failure(27.4.4.3));
c == delim for the next available input character c (in which case c is
extracted). Note: The last condition
will never occur if delim ==
traits::eof()
My (somewhat tentative) interpretation is that GCC is wrong here, because of the bold parts above. Ignore should behave as an unformatted input function, (like read()), which means that end-of-file should only occur on the input sequence if there is an attempt to extract additional bytes after the last byte in the stream has been extracted.
I'll submit a bug report if I find that enough people agree with this answer.
The consensus seemed to be that this was a legitimate bug in gcc. Since I saw no indication a bug report had been filed, I'm doing so now. The report can be viewed at:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51651