std::ifstream::read or std::ofstream::write with a zero parameter? - c++

Is it perfectly ok (= well defined behaviour according to the standard) to call :
mystream.read(buffer, 0);
or
mystream.write(buffer, 0);
(and of course nothing will be read or written).
I would like to know if I have to test if the provided size is null before calling one of these two functions.

Yes, the behavior is well-defined: both functions will go through the motions for unformatted input/output functions (constructing the sentry, setting failbit if eofbit is set, flushing the tied stream if necessary), and then they will get to this clause:
§27.7.2.3[istream.unformatted]/30
Characters are extracted and stored until either of the following occurs:
— n characters are stored;
§27.7.3.7[ostream.unformatted]/5
Characters are inserted until either of the following occurs
— n characters are inserted;
"zero characters are stored/inserted" is true before anything is stored or extracted.
Looking at actual implementations, I see for (; gcount < n; ++gcount) in libc++ or sgetn(buffer, n); in stdlibc++ which has the equivalent loop

Another extraction from 27.7.2.3 Unformatted input functions/1 gives us a clue that zero-size input buffers are valid case:
unformatted input functions taking a character array of non-zero size as an argument shall also store a null character (using charT()) in the first location of the array.

Related

When is calling basic_filebuf::pbackfail allowed/defined to succeed

On implementing a basic_filebuf I stumbled over basic_filebuf::pbackfail and don't fully understand its definition.
From cplusplus.com
Moves the current input position on position back to point to the previous character and, if supported, makes c available as that next character to be read.
If the implementation does not support writing to putback positions, c shall either match the character at the putback position or be the end-of-file value (traits_type::eof()). Otherwise, the function fails. [...]
If the get pointer (gptr) is at the beginning of the character sequence before the call, the function may either fail or make additional putback positions available and succeed, depending on the library implementation.
Or from cppreference:
1) The caller is requesting that the get area is backed up by one character (pbackfail() is called with no arguments), in which case, this function re-reads the file starting one byte earlier and decrements basic_streambuf::gptr(), e.g. by calling gbump(-1).
2) The caller attempts to putback a different character from the one retrieved earlier (pbackfail() is called with the character that needs to be put back), in which case
a) First, checks if there is a putback position, and if there isn't, backs up the get area by re-reading the file starting one byte earlier.
a) Then checks what character is in the putback position. If the character held there is already equal to c, as determined by Traits::eq(to_char_type(c), gptr()[-1]), then simply decrements basic_streambuf::gptr().
b) Otherwise, if the buffer is allowed to modify its own get area, decrements basic_streambuf::gptr() and writes c to the location pointed to gptr() after adjustment.
So in essence both say that the input position is decremented (unless it is at the start of the file) and possibly a char is put back. So the following should succeed (assume prepared file according to comments, using std classes for comparison of behavior):
std::fstream f(..., in | out | binary);
f.get() // == 'a'
f.get() // == 'b'
f.sync(); // or f.seekg(0)
f.putback('b');
f.putback('a');
f.putback(); // may fail
However on libc++ the first putback fails already and checking the source code I found pbackfail to be guarded by if (__file_ && this->eback() < this->gptr()) aka "if there is an open file and there is space at the front of the current read buffer".
A flush/sync/seek clears the read buffer which explains the failing putback. When using unbuffered IO there will only be a single char space in the read buffer so (at least) the 2nd putback will fail even without the flush. Or the second get might cross a buffer "border" which means "b" will be the first char in the current buffer which also makes the second putback fail.
Question: How is putback exactly specified? It seems to be only valid immediately after a get although both cppreference and cplusplus seem to imply that the read position is decremented in any case. If they are right, is libc++ non-conforming or am I missing anything?
Question: How is putback exactly specified?
It is specified in the section istream.unformatted.
However, that refers to sputbackc, and you'll probably want to read the general stream buffer requirements as well.

ifstream getline does not find delim

If the ifstream::getline call does not find the delimeter I know it sets failbit but does it also clear out the buffer or does it leave the buffer intact and just set the fail bit to let you know?
There seems to be some confusion about the different states of an input stream (and rightly so, they are confusing):
C++ Standard, table 124
badbit indicates a loss of integrity in an input or output sequence (such as an irrecoverable read error from a file);
eofbit indicates that an input operation reached the end of an input sequence;
failbit indicates that an input operation failed to read the expected characters, or
that an output operation failed to generate the desired characters.
That is, failbit is set when basic_istream::getline(char_type* s, std::streamsize count, char_type delim) extracts count-1 characters without finding the delimiter (-1 for the terminating \0 stored). This does not indicate the stream is bad, but it rather indicates that getline has failed to find the delimiter.
Description of basic_istream::getline in the C++ Standard:
[istream.unformatted]/18
Effects: [...] After constructing a sentry object [= preparation for input and error checks], extracts characters and stores them into successive locations of an array whose first element is designated by s. Characters are extracted and stored until one of the following occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) [= delimiter found] for the next available input character c (in which case the input character is extracted but not stored);
n is less than one or n - 1 characters are stored (in which case the function calls setstate(failbit)).
[...]
These conditions are tested in the order shown.
If the function extracts no characters, it calls setstate(failbit) [...]
In any case, if n is greater than zero, it then stores a null character [...] into the next successive location of the array.
[emphasis and omissions mine]
From here:
Internally, the function accesses the input sequence by first
constructing a sentry object (with noskipws set to true). Then (if
good), it extracts characters from its associated stream buffer object
as if calling its member functions sbumpc or sgetc, and finally
destroys the sentry object before returning.
It seems that the buffer is filled until there is a problem. (See DyP's comments)

Why does string extraction from a stream set the eof bit?

Let's say we have a stream containing simply:
hello
Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).
Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:
If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit isn't being set.
The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?
To make it easier, the relevant sections of the standard are:
21.4.8.9 Inserters and extractors [string.io]
27.7.2.2 Formatted input functions [istream.formatted]
27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.
Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

What is the rationale behind not having a writesome function for basic_ostream in the STL?

While working on a network library I recently noticed that having a pendant to basic_streambuf::in_avail in conjunction with a writesome function would be pretty handy for asynchronous I/O.
I have searched the web and checked several C++ references if there is any function which meets these demands, but it seems I had no luck. The only source which mentions similar functionality is Boost's Asio library, however the description clearly states that the function call will block until at least one byte has been sent which does not reflect my desired behaviour.
To elaborate on my question, I created a rough draft based on the C++ N3337 publication.
27.6.3.2.5 Put area [streambuf.pub.put]
streamsize in_depart();
Returns: If a write position is available, returns epptr() - pptr(). Otherwise
returns showmanycp().
27.6.3.4.5 Put area [streambuf.virt.put]
streamsize showmanycp();
Returns: An estimate of the number of characters which can be written to the sequence,
or -1. If it returns a positive value, then successive calls to overflow() will not
return traits::eof() until at least that number of characters have been written to the
stream. If showmanycp() returns -1, then calls to overflow()will fail.
Default behavior: Returns zero.
Remarks: Uses traits::eof().
27.7.3.7 Unformatted output functions [ostream.unformatted]
streamsize writesome(char_type* s, streamsize n);
Effects: Behaves as an unformatted output function (as described in 27.7.3.7, paragraph
1). After constructing a sentry object, if !good() callssetstate(failbit) which may
throw an exception, and return. Otherwise writes n characters designated by s. If
rdbuf()->in_depart() == -1, calls setstate(badbit) (which may throw
ios_base::failure (27.5.5.4)), and writes no characters;
— If rdbuf()->in_depart() == 0, writes no characters.
— If rdbuf()->in_depart() > 0, writes min(rdbuf()->in_depart(), n)) characters.
Returns: The number of characters written.
I'd guess you misinterpret the meaning of in_avail() and readsome(): all these say is that the stream had read a block of data and there are still characters in the buffer. Yes, it could theoretically do something different but in particukar when reading from a network you don't know how much data is available until you tried reading it.
Similarily, there is no way to guarantee to be able to get rid of a y specific number of characters: what would out.writesome(buf, n) mean? If you want it to mean that you dumped n characters into out's buffer, you can just create a suitable stream buffer and use write(). Guaranteeing that n bytes are sent with blocking, however, can't be done (at least, for 1 < n). I guess you want the latter, though.

Contents of the string after failed extraction from istream

If I do this:
ifstream stream("somefilewhichopenssuccesfully.txt");
string token;
if( stream >> token )
cout << token;
else
cout << token;
Is the output in the second case guaranteed to be an empty string? I can't seem to find the answer to this on cplusplus.com.
Thanks!
Is the output in the second case guaranteed to be an empty string?
The answer is : no, because it depends, as described below.
Since else block will be executed only if an attempt to read from the stream fails, and that can occur anytime in the course of reading.
If it fails at the very first attempt, then there is no character extraction from the stream, and hence token will be empty (as it was).
If it fails after few reads, then token will not be empty. It will contain the characters successfully read so far from the stream.
The section §21.3.7.9 from the Standard says,
Begins by constructing a sentry object
k as if k were constructed by typename
basic_istream::sentry
k(is). If bool(k) is true, it calls
str.erase() and then extracts
characters from is and appends them to
str as if by calling str.append(1,c).
If is.width() is greater than zero,
the maximum number n of characters
appended is is.width(); otherwise n is
str.max_size(). Characters are
extracted and appended until any of
the following occurs:
— n characters
are stored;
— end-of-file occurs on
the input sequence;
— isspace(c,is.getloc()) is true for the
next available input character c.
After the last character (if any) is
extracted, is.width(0) is called and
the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.4.4.3).
Also note that the section §21.3.1/2 from the Standard guarantees that the default constructed string will be empty. The Standard says its size will be zero, that means, empty.
I deleted my original answer because I wanted to test this. This is what I see, if there is an error whilst reading (EOF is not counted in this context), the original string is modified and the branch sees the modified version. To test I did the following, created a 2Gb file (touch then truncate), the above code to read. Whilst the code was running, removed the file (this should set the failbit - I think). Immediately stops reading, but the string is modified - it has a larger size.
To me this indicates that the string is modified even if the stream operation fails.
No, even if the operation fails, the string will contain the characters extracted so far.
The standard says (§21.4.8.9):
Effects: Behaves as a formatted input function (27.7.2.2.1). After constructing a sentry object, if the sentry converts to true, calls str.erase() and then extracts characters from is and appends them to str as if by calling str.append(1,c). If is.width() is greater than zero, the maximum number n of characters appended is is.width(); otherwise n is str.max_size(). Characters are extracted and appended until any of the following occurs:
— n characters are stored;
— end-of-file occurs on the input sequence;
— isspace(c,is.getloc()) is true for the next available input character c.