On implementing a basic_filebuf I stumbled over basic_filebuf::pbackfail and don't fully understand its definition.
From cplusplus.com
Moves the current input position on position back to point to the previous character and, if supported, makes c available as that next character to be read.
If the implementation does not support writing to putback positions, c shall either match the character at the putback position or be the end-of-file value (traits_type::eof()). Otherwise, the function fails. [...]
If the get pointer (gptr) is at the beginning of the character sequence before the call, the function may either fail or make additional putback positions available and succeed, depending on the library implementation.
Or from cppreference:
1) The caller is requesting that the get area is backed up by one character (pbackfail() is called with no arguments), in which case, this function re-reads the file starting one byte earlier and decrements basic_streambuf::gptr(), e.g. by calling gbump(-1).
2) The caller attempts to putback a different character from the one retrieved earlier (pbackfail() is called with the character that needs to be put back), in which case
a) First, checks if there is a putback position, and if there isn't, backs up the get area by re-reading the file starting one byte earlier.
a) Then checks what character is in the putback position. If the character held there is already equal to c, as determined by Traits::eq(to_char_type(c), gptr()[-1]), then simply decrements basic_streambuf::gptr().
b) Otherwise, if the buffer is allowed to modify its own get area, decrements basic_streambuf::gptr() and writes c to the location pointed to gptr() after adjustment.
So in essence both say that the input position is decremented (unless it is at the start of the file) and possibly a char is put back. So the following should succeed (assume prepared file according to comments, using std classes for comparison of behavior):
std::fstream f(..., in | out | binary);
f.get() // == 'a'
f.get() // == 'b'
f.sync(); // or f.seekg(0)
f.putback('b');
f.putback('a');
f.putback(); // may fail
However on libc++ the first putback fails already and checking the source code I found pbackfail to be guarded by if (__file_ && this->eback() < this->gptr()) aka "if there is an open file and there is space at the front of the current read buffer".
A flush/sync/seek clears the read buffer which explains the failing putback. When using unbuffered IO there will only be a single char space in the read buffer so (at least) the 2nd putback will fail even without the flush. Or the second get might cross a buffer "border" which means "b" will be the first char in the current buffer which also makes the second putback fail.
Question: How is putback exactly specified? It seems to be only valid immediately after a get although both cppreference and cplusplus seem to imply that the read position is decremented in any case. If they are right, is libc++ non-conforming or am I missing anything?
Question: How is putback exactly specified?
It is specified in the section istream.unformatted.
However, that refers to sputbackc, and you'll probably want to read the general stream buffer requirements as well.
Related
Say I have an input operation:
file >> x;
If the internal buffer of file is empty underflow() will be called to import characters from the external device to the internal buffer of file. It is implementation-defined if the buffer will be partially or completely filled after this flush operation. Taking that into account, is it possible that if x is a string and I am expecting an input value of a certain length, that the buffer is in its right to transfer fewer characters than that? Can this happen?
There is no real constraint on how many characters underflow() makes available. The only real constraint is that a stream which hasn't reached EOF needs to make, at least, one character available. With respect specifically std::filebuf (or std::basic_filebuf<...>) the stream may be unbuffered (if setbuf(0, 0) was called) in which case it would, indeed, make individual characters available. Otherwise, the stream will try to fill its internal buffer and rely on the operating system to have a the underlying operation return a suitable amount of bytes if there are few available, yet.
I'm not sure I quite understand your question: the operation file >> x will return once x is completely read which can happen if the stream indicated by file has reached its end or when a whitespace character is found (and if with "string" you mean char*, a non-zero value stored in file.width() is also taken into account). With respect to the underlying stream buffer, clearly x may require multiple reads to the underlying representation, i.e., it is unpredictable how many calls to underflow() are made. Given that the file's internal buffer is probably matching the disc's block size, I would expect that at most one call to underflow() is made for "normal" strings. However, if the file read is a huge and doesn't contain any spaces many calls to underflow() may be made. Given that the stream needs to find whitespaces it has no way to predict how many characters are needed in the first place.
If the ifstream::getline call does not find the delimeter I know it sets failbit but does it also clear out the buffer or does it leave the buffer intact and just set the fail bit to let you know?
There seems to be some confusion about the different states of an input stream (and rightly so, they are confusing):
C++ Standard, table 124
badbit indicates a loss of integrity in an input or output sequence (such as an irrecoverable read error from a file);
eofbit indicates that an input operation reached the end of an input sequence;
failbit indicates that an input operation failed to read the expected characters, or
that an output operation failed to generate the desired characters.
That is, failbit is set when basic_istream::getline(char_type* s, std::streamsize count, char_type delim) extracts count-1 characters without finding the delimiter (-1 for the terminating \0 stored). This does not indicate the stream is bad, but it rather indicates that getline has failed to find the delimiter.
Description of basic_istream::getline in the C++ Standard:
[istream.unformatted]/18
Effects: [...] After constructing a sentry object [= preparation for input and error checks], extracts characters and stores them into successive locations of an array whose first element is designated by s. Characters are extracted and stored until one of the following occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) [= delimiter found] for the next available input character c (in which case the input character is extracted but not stored);
n is less than one or n - 1 characters are stored (in which case the function calls setstate(failbit)).
[...]
These conditions are tested in the order shown.
If the function extracts no characters, it calls setstate(failbit) [...]
In any case, if n is greater than zero, it then stores a null character [...] into the next successive location of the array.
[emphasis and omissions mine]
From here:
Internally, the function accesses the input sequence by first
constructing a sentry object (with noskipws set to true). Then (if
good), it extracts characters from its associated stream buffer object
as if calling its member functions sbumpc or sgetc, and finally
destroys the sentry object before returning.
It seems that the buffer is filled until there is a problem. (See DyP's comments)
Is it perfectly ok (= well defined behaviour according to the standard) to call :
mystream.read(buffer, 0);
or
mystream.write(buffer, 0);
(and of course nothing will be read or written).
I would like to know if I have to test if the provided size is null before calling one of these two functions.
Yes, the behavior is well-defined: both functions will go through the motions for unformatted input/output functions (constructing the sentry, setting failbit if eofbit is set, flushing the tied stream if necessary), and then they will get to this clause:
§27.7.2.3[istream.unformatted]/30
Characters are extracted and stored until either of the following occurs:
— n characters are stored;
§27.7.3.7[ostream.unformatted]/5
Characters are inserted until either of the following occurs
— n characters are inserted;
"zero characters are stored/inserted" is true before anything is stored or extracted.
Looking at actual implementations, I see for (; gcount < n; ++gcount) in libc++ or sgetn(buffer, n); in stdlibc++ which has the equivalent loop
Another extraction from 27.7.2.3 Unformatted input functions/1 gives us a clue that zero-size input buffers are valid case:
unformatted input functions taking a character array of non-zero size as an argument shall also store a null character (using charT()) in the first location of the array.
While working on a network library I recently noticed that having a pendant to basic_streambuf::in_avail in conjunction with a writesome function would be pretty handy for asynchronous I/O.
I have searched the web and checked several C++ references if there is any function which meets these demands, but it seems I had no luck. The only source which mentions similar functionality is Boost's Asio library, however the description clearly states that the function call will block until at least one byte has been sent which does not reflect my desired behaviour.
To elaborate on my question, I created a rough draft based on the C++ N3337 publication.
27.6.3.2.5 Put area [streambuf.pub.put]
streamsize in_depart();
Returns: If a write position is available, returns epptr() - pptr(). Otherwise
returns showmanycp().
27.6.3.4.5 Put area [streambuf.virt.put]
streamsize showmanycp();
Returns: An estimate of the number of characters which can be written to the sequence,
or -1. If it returns a positive value, then successive calls to overflow() will not
return traits::eof() until at least that number of characters have been written to the
stream. If showmanycp() returns -1, then calls to overflow()will fail.
Default behavior: Returns zero.
Remarks: Uses traits::eof().
27.7.3.7 Unformatted output functions [ostream.unformatted]
streamsize writesome(char_type* s, streamsize n);
Effects: Behaves as an unformatted output function (as described in 27.7.3.7, paragraph
1). After constructing a sentry object, if !good() callssetstate(failbit) which may
throw an exception, and return. Otherwise writes n characters designated by s. If
rdbuf()->in_depart() == -1, calls setstate(badbit) (which may throw
ios_base::failure (27.5.5.4)), and writes no characters;
— If rdbuf()->in_depart() == 0, writes no characters.
— If rdbuf()->in_depart() > 0, writes min(rdbuf()->in_depart(), n)) characters.
Returns: The number of characters written.
I'd guess you misinterpret the meaning of in_avail() and readsome(): all these say is that the stream had read a block of data and there are still characters in the buffer. Yes, it could theoretically do something different but in particukar when reading from a network you don't know how much data is available until you tried reading it.
Similarily, there is no way to guarantee to be able to get rid of a y specific number of characters: what would out.writesome(buf, n) mean? If you want it to mean that you dumped n characters into out's buffer, you can just create a suitable stream buffer and use write(). Guaranteeing that n bytes are sent with blocking, however, can't be done (at least, for 1 < n). I guess you want the latter, though.
I'm using a Standard iostream to get some input from a file, and I'm confused about unget() versus putback(character). It seems to me from the documentation that these functions are effectively identical, where unget() just remembers the character put in, so I'm nervous. I've always used putback(character), but character is always the last read character and I've been thinking about changing to unget(). Is putback(character) always identical to unget(), if character is always the last read character?
You can't lie with unget(). It "ungets" the last-read character. You can lie with putback(c). You can "putback" some character other than the last-read character. Sometimes putting back a character other than the last-read character can be useful.
Also, if the underlying read buffer really does have buffering capability, you can "putback" more than one character. I think ungetc() is limited to one character.
Edit
Nope. It looks like unget() can go as far back as putback().
It's not the answer you probably expect, but want to introduce my reasoning. Documentation stays that the methods putback and unget call streambuf::sputbackc and streambuf::sungetc respectively. Definitions are as follow:
streambuf::sungetc
Moves the get pointer one character backwards, making the last character gotten by an input operation available once again for the next input operation.
During its operation, the function will call the protected virtual member function pbackfail if the get pointer gptr points to the same position as the beginning pointer eback.
The other one:
streambuf::sputbackc
The get pointer is moved back to point to the character right before its current position so the last character gotten, c, becomes available again as the character to be read at that position by the next input operation.
During its operation, the function calls the protected virtual member function pbackfail either if the character c doesn't match gptr()[-1] or if the get pointer gptr points to the same position as the beginning pointer eback.
When c does not match the character at that position, the default definition of pbackfail in streambuf will prepend c to be the character extracted at that position if possible, but derived classes may override this behavior.
The member function sungetc behaves in a similar way but without taking any parameter
As sputbackc calls pbackfail if character doesn't match, it means the method has to check if the values are equal. It looks like the additional check is the only overhead, but have no idea how it is solved in practise. I can imagine that if the last character is not stored in the object then it has to be reread, so you might expect it even when the characters are guaranteed to be the same.
I was a little bit concerned about situation when we call unget, but last character is not available. Would the putback put the value correctly? I doubt, but it shouldn't be the case while operating on files.