I was working on this answer. And I ran into a conundrum: scanf has an assignment suppressing '*':
If this option is present, the function does not assign the result of the conversion to any receiving argument
But when used in get_time the '*' gives a run-time error on Visual Studio, libc++, and libstdc++: str >> get_time(&tmbuf, "%T.%*Y") so I believe it's not supported.
As such I chose to ignore input by reading into tmbuf.tm_year twice:
str >> get_time(&tmbuf, "%H:%M:%S.%Y UTC %b %d %Y");
This works and seems to be my only option so far as get_time goes since the '*' isn't accepted. But as we all know, just because it works doesn't mean it's defined. Can someone confirm that:
It is defined to assign the same variable twice in get_time
The stream will always be read left-to-right so the 1stincidence of %Y will be stomped, not the 2nd
The standard specifies the exact algorithm of processing the format string of get_time in 22.4.5.1.1 time_get members. (time_get::get is what eventually gets called when you do str>>get_time(...)). I quote the important parts:
The function starts by evaluating err = ios_base::goodbit. It then enters a loop, reading zero or more characters from s at each iteration. Unless otherwise specified below, the loop terminates when the first of the following conditions holds:
(8.1) — The expression fmt == fmtend evaluates to true.
skip boring error-handling parts
(8.4) — The next element of fmt is equal to ’%’, optionally followed by a modifier character, followed by a conversion specifier character, format, together forming a conversion specification valid for the ISO/IEC 9945 function strptime. skip boring error-handling parts the function evaluates s = do_get(s, end, f, err, t, format, modifier) skip more boring error-handling parts the function increments fmt to point just past the end of the conversion specification and continues looping.
As can be seen from the description, the format string is processed strictly sequentially left to right. There's no provision to handle repeating conversion specifications specially. So the answer must be yes, what you have done is it is well defined and perfectly legal.
Related
During writing some code I had a typo that lead to unexpected compilation results and caused me to play and test what would be acceptable by the compiler (VS 2010).
I wrote an expression consisting of only the parenthesis operator with a number in it (empty parenthesis give a compilation error):
(444);
When I ran the code in debug mode, it seems that the line is simply skipped by the program. What is the meaning of the parenthesis operator when it appears by itself?
If I can answer informally,
(444);
is a statement. It can be written wherever the language allows you to write a statement, such as in a function. It consists of an expression 444, enclosed in parentheses (which is also an expression) followed by the statement terminator ;.
Of course, any sane compiler operating in accordance with the as-if rule, will remove it during compilation.
One place where at least one statement is required is in a switch block (even if program control never reaches that point):
switch (1){
case 0:
; // Removing this statement causes a compilation error
}
(444); is a statement consisting of a parenthesized expression (444) and a statement terminator ;
(444) consists of parentheses () and a prvalue expression 444
A parenthesized expression (E) is a primary expression whose type, value, and value category are identical to those of E. The parenthesized expression can be used in exactly the same contexts as those where E can be used, and with the same meaning, except as otherwise indicated.
So in this particular case, parentheses have no additional significance,
so (444); becomes 444; which is then optimized out by the compiler.
I start to learn C++ from C. Recently, I have just read a tutorial book about C++. In section Introduce to streams, the book has noted:
The << operator is overloaded so that the operand on the right can be
a string or any primitive value. If this operand is not a string, the
<< operator converts it to string form before sending it to the output
stream.
So I wonder whether printf() function in C has the same effect. And if it doesn't, please tell me about the differences between both of them.
Well, of course it has to somehow generate a string representation of each argument, that is needed in order to have something to print. Printing involves sending streams of characters to an output device after all, you can't print unless you have a sequence of characters.
The printf() function uses the formatting string to control how to interpret each argument in order to create the character representation, and also how to format that representation when output.
Note that no "conversion" of the arguments happens that is visible externally, of course. There's no way
printf("%d\n", 47);
can make that 47 into a string in place; C uses call by value so the function only gets a copy of the value, and it then uses the type information implicit in the %d conversion specifier to figure out how to generate the two characters '4' and '7' that make up the printed representation.
So I wonder whether printf() function in C has the same effect.
Both C and C++ uses streams for output. In C it is stdout and in C++ it is cout.
Though it is not evident from the statement printf writes to standard output(stdout) say a terminal.
In case of cout, it is evident from a statement itself, where the output is going.
Some subtle differences
With cout you might need to include an additional header - say iomanip - and use some functions - say setw() - to have fine formatting where as in printf you rely on format string.
Performance - Each has its own advantage depending on what you print and where you print. I borrowed this point from here.
Another similarity
Both C++ and C standards mention nothing about the order of evaluation of function arguments. So you must not try fancy stuff with functions. For example neither should you do
printf(%d%d",++i,i++); // The behaviour is undefined.
nor should you do
cout<<++i<<++i; // The behaviour is undefined.
Note:
Remember that the c streams are available in C++ if you include the necessary headers.
Sample code at Coliru:
#include <iostream>
#include <sstream>
#include <string>
int main()
{
double d; std::string s;
std::istringstream iss("234cdefipxngh");
iss >> d;
iss.clear();
iss >> s;
std::cout << d << ", '" << s << "'\n";
}
I'm reading off N3337 here (presumably that is the same as C++11). In [istream.formatted.arithmetic] we have (paraphrased):
operator>>(double& val);
As in the case of the inserters, these extractors depend on the locale’s num_get<> (22.4.2.1) object
to perform parsing the input stream data. These extractors behave as formatted input functions (as
described in 27.7.2.2.1). After a sentry object is constructed, the conversion occurs as if performed by the following code fragment:
typedef num_get< charT,istreambuf_iterator<charT,traits> > numget;
iostate err = iostate::goodbit;
use_facet< numget >(loc).get(*this, 0, *this, err, val);
setstate(err);
Looking over to 22.4.2.1:
The details of this operation occur in three stages
— Stage 1: Determine a conversion specifier
— Stage 2: Extract characters from in and determine a corresponding char value for the format
expected by the conversion specification determined in stage 1.
— Stage 3: Store results
In the description of Stage 2, it's too long for me to paste the whole thing here. However it clearly says that all characters should be extracted before conversion is attempted; and further that exactly the following characters should be extracted:
any of 0123456789abcdefxABCDEFX+-
The locale's decimal_point()
The locale's thousands_sep()
Finally, the rules for Stage 3 include:
— For a floating-point value, the function strtold.
The numeric value to be stored can be one of:
— zero, if the conversion function fails to convert the entire field.
This all seems to clearly specify that the output of my code should be 0, 'ipxngh'. However, it actually outputs something else.
Is this a compiler/library bug? Is there any provision that I'm overlooking for a locale to change the behaviour of Stage 2? (In another question someone posted an example of a system that does actually extract the characters, but also extracts ipxn which are not in the list specified in N3337).
Update
As pointed out by perreal, this text from Stage 2 is relevant:
If discard is true, then if ’.’ has not yet been accumulated, then the position of the character
is remembered, but the character is otherwise ignored. Otherwise, if ’.’ has already been
accumulated, the character is discarded and Stage 2 terminates. If it is not discarded, then a
check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.
If the character is either discarded or accumulated then in is advanced by ++in and processing
returns to the beginning of stage 2.
So, Stage 2 can terminate if the character is in the list of allowed characters, but is not a valid character for %g. It doesn't say exactly, but presumably this refers to the definition of fscanf from C99 , which allows:
a nonempty sequence of decimal digits optionally containing a decimal-point
character, then an optional exponent part as defined in 6.4.4.2;
a 0x or 0X, then a nonempty sequence of hexadecimal digits optionally containing a
decimal-point character, then an optional binary exponent part as defined in 6.4.4.2;
INF or INFINITY, ignoring case
NAN or NAN(n-char-sequence opt ), ignoring case in the NAN part, where:
and also
In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.
So, actually the Coliru output is correct; and in fact the processing must attempt to validate the sequence of characters extracted so far as a valid input to %g, while extracting each character.
Next question: is it permitted, as in the thread I linked to earlier, to accept i , n, p etc in Stage 2?
These are valid characters for %g , however they are not in the list of atoms which Stage 2 is allowed to read (i.e. c == 0 for my latest quote, so the character is neither discarded nor accumulated).
This is a mess because it's likely that neither gcc/libstdc++'s nor clang/libc++'s implementation is conforming. It's unclear "a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1" means, but I think that the use of the phrase "next character" indicates that check should be context-sensitive (i.e., dependent on the characters that have already been accumulated), and so an attempt to parse, e.g., "21abc", should stop when 'a' is encountered. This is consistent with the discussion in LWG issue 2041, which added this sentence back to the standard after it had been deleted during the drafting of C++11. libc++'s failure to do so is bug 17782.
libstdc++, on the other hand, refuses to parse "0xABp-4" past the 0, which is actually clearly nonconforming based on the standard (it should parse "0xAB" as a hexfloat, as clearly allowed by the C99 fscanf specification for %g).
The accepting of i, p, and n is not allowed by the standard. See LWG issue 2381.
The standard describes the processing very precisely - it must be done "as if" by the specified code fragment, which does not accept those characters. Compare the resolution of LWG issue 221, in which they added x and X to the list of characters because num_get as then-described won't otherwise parse 0x for integer inputs.
Clang/libc++ accepts "inf" and "nan" along with hexfloats but not "infinity" as an extension. See bug 19611.
At the end of stage 2, it says:
If it is not discarded, then a check is made to determine if c is
allowed as the next character of an input field of the conversion
specifier returned by Stage 1. If so, it is accumulated.
If the character is either discarded or accumulated then in is advanced by ++in and processing returns to the beginning of stage 2.
So perhaps a is not allowed in the %g specifier and it is not accumulated or ignored.
Is it perfectly ok (= well defined behaviour according to the standard) to call :
mystream.read(buffer, 0);
or
mystream.write(buffer, 0);
(and of course nothing will be read or written).
I would like to know if I have to test if the provided size is null before calling one of these two functions.
Yes, the behavior is well-defined: both functions will go through the motions for unformatted input/output functions (constructing the sentry, setting failbit if eofbit is set, flushing the tied stream if necessary), and then they will get to this clause:
§27.7.2.3[istream.unformatted]/30
Characters are extracted and stored until either of the following occurs:
— n characters are stored;
§27.7.3.7[ostream.unformatted]/5
Characters are inserted until either of the following occurs
— n characters are inserted;
"zero characters are stored/inserted" is true before anything is stored or extracted.
Looking at actual implementations, I see for (; gcount < n; ++gcount) in libc++ or sgetn(buffer, n); in stdlibc++ which has the equivalent loop
Another extraction from 27.7.2.3 Unformatted input functions/1 gives us a clue that zero-size input buffers are valid case:
unformatted input functions taking a character array of non-zero size as an argument shall also store a null character (using charT()) in the first location of the array.
In the C++ standard (section 27.6.1.3\24), for the
istream ignore() function in the IOStreams library, it implies that if you supply an argument for 'n' of numeric_limits::max(), it will continue to ignore characters
forever up until the delimiter is found, even way beyond the actual
max value for streamsize (i.e. the 'n' argument is interpreted as infinite).
For the gcc implementation this does indeed appear to be how
ignore() is implemented, but I'm still unclear as to
whether this is implementation specific, or mandated by the standard.
Can someone who knows this well confirm that this is guaranteed by a
standard compliant iostreams library?
The standard says that numeric_limits<streamsize>::max() is a special value that doesn't affect the number of characters skipped.
Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
-- if n != numeric_limits<streamsize>::max() (18.3.2), n characters are extracted
-- end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
-- traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
According to here:
istream& istream::ignore ( streamsize n = 1, int delim = EOF );
Extract and discard characters
Extracts characters from the input sequence and discards them.
The extraction ends when n characters have been extracted and discarded or when the character delim is found, whichever comes first. In the latter case, the delim character itself is also extracted.
In your case, when numeric_limits::max() number of characters have been reached, the first condition is met.
[Per Bo]
However, according to spec, the above case is applied only when n is not equal to numeric_limits<streamsize>::max().