Why an insertion in streams makes its width reset to 0? - c++

It seems an insertion causes a stream to reset its width to 0. In other words, you need to repeatedly call the width() whenever you want to align your output, which makes me wonder why. I've looked over the c++11 standard and I found a partial answer, which suggests that width(0) is called inside character inserters according to the ISO C++ standard §27.7.3.6.4.1. However, they are just a subset of the inserters, which is not a complete answer. Then, what about arithmetic inserters? I couldn't find any other references that can explain the behavior of streams.

Because that's the way the streams were specified. In an << operator,
width should be reset, the other formatting options no. I can only
guess as to why: presumably, for things like:
std::cout << std::setw(8) << someValue << " cm";
You wouldn't want the width to apply to the string " cm", for example.
But you would want to be able to write:
std::cout << std::setw(8) << price << " | " << setw(20) << article;
where article is a string.
Except that, of course, for that to work, you'd have to also change the
justification before the string, and the change in justification would
be sticky, affecting the next numeric output.
In practice, of course, experienced programmers don't write this sort of
code. They'd use something like:
std::cout << price(column1Width) << article.price
<< " | " << label(column2Width) << article.name;
(supposing that they still had to generate tables using a fixed width
font). Where price and label were manipulators which set any number
of format flags, and restored them in their destructor. (Since they are
temporaries, their destructor will be called at the end of the full
expression.) This way, this particular line of code doesn't say
anything about the physical formatting, but rather that the first value
should be formatted as a price, and the second as a label. (And of
course, if someone higher up later decides that prices or labels should
be formatted differently, you just change the manipulators, rather than
searching all of the output statements, and trying to figure out which
are prices, and which aren't.)
EDIT (added references to the standard):
It's important to note that the standard cannot cover everything here,
since most of the time, you'll be using custom operator<<, written by
the author of the class you're outputting. Most of the built-in
operator<< are covered by §22.4.2.2.2, in its description of stage 3:
If str.width() is nonzero and the number of charT’s in the sequence
after stage 2 is less than str.width(), then enough fill characters are
added to the sequence at the position indicated for padding to bring the
length of the sequence to str.width().
str.width(0) is called.
For characters and C style strings, this is specified (in much the same
way) in §27.7.3.6.4. For std::string, see §21.4.8.8.
For std::complex: the standard defines insertion in terms of other
insertion operators. Any setting of width will thus affect only the
real element. (Practically speaking, I think we can consider this
broken. When I implemented a pre-standard complex class, my <<
checked the width, and if it was non-zero, subtracted 3 for the
non-numeric fields, then divided by 2 and set it before outputting each
double. I"m not sure that this is right, but it's certainly better than
what the standard specifies. And I also used a semi-colon as separator
if the decimal was a comma, as it is in most places I've lived.)
The fact that the other formatting options remain unchanged is because
the standard doesn't specify anything, and additional side effects,
other than those specified, are assumed to be forbidden.
In all cases, the specification says that stream.width(0) is called.
But in the standard, each case is specified separately. There is no
specification (even for intent) of any general rule that I can find.
Traditionally, however, the rule has always been: reset width, and leave
the others unchanged. The principle of least surprise says that you
should do this for user defined << (and >>) as well.

Related

Writing single-char vs. char const* to buffer

When writing single characters to an output stream, the purist in me wants to use single quotes (e.g.):
unsigned int age{40};
std::ostringstream oss;
oss << "In 2022, I am " << age << '\n'; // 1. Single quotes around \n
oss << "In 2023, I will be " << age + 1u << "\n"; // 2. Minor ick--double quotes around \n
Because I'm writing a single character and not an arbitrary-length message, it doesn't seem necessary to have to provide a null-terminated string literal.
So I decided to measure the difference in speed. Naively, I'd expect option 1, the single-character version, to be faster (only one char, no need to handle \0). However, my test with Clang 13 on quick-bench indicates that option 2 is a hair faster. Is there an obvious reason for this?
https://quick-bench.com/q/3Zcp62Yfw_LMbh608cwHeCc0Nd4
Of course, if the program is spending a lot of time writing data to a stream anyway, chances are the program needs to be rethought. But I'd like to have a reasonably correct mental model, and because the opposite happened wrt what I expected, my model needs to be revised.
As you can see in the assembly and in the libc++ source here, both << operations in the end call the same function __put_character_sequence which the compiler decided to not inline in either case.
So, in the end you are passing a pointer to the single char object anyway and if there is a pointer indirection overhead it applies equally to both cases.
__put_character_sequence also takes the length of the string as argument, which the compiler can easily evaluate at compile-time for "\n" as well. So there is no benefit there any way either.
In the end it probably comes down to the compiler having to store the single character on the stack since without inlining it can't tell whether __put_character_sequence will modify it. (The string literal cannot be modified by the function and also would have the same identity between iterations of the loop.)
If the standard library used a different approach or the compiler did inline slightly differently, the result could easily be the other way around.

C++ When should I std::ctype<char>::widen()?

Apparently, writing a single character of type char to a stream whose char type is char is guaranteed by the standard to not invoke ctype<char>.widen() on the associated locale.
On the other hand, according to my reading of the standard (C++17), when writing a string of chars (const char*) instead of a single char, ctype<char>.widen() must be invoked.
I am struggling to understand how to make sense of this.
On one hand, the fact, that widen() is required when writing strings, suggests that there are valid scenarios where widen() has an effect. But if that is the case, then how can it be alright to omit the widening operation when writing single characters?
It seems to me that there must be an intended difference in the roles (domains of applicability) of the two operations, output of single char (char) and output of string (const char*), but I do not see what it is.
To make things more concrete, let us say that I wanted to implement an output operator for a range object, and have the output be on the form 0->2. My first inkling would be something like this:
std::ostream& operator<<(std::ostream& out, const Range& range)
{
// ...
out << "->"; // Invokes widen()
// ...
}
But, is this how I am supposed to do it? Or would out << '-' << '>' (no widening) have been better / more correct?
Curiously, the formulation of the standard suggests to me that the two forms do not always produce the same result. Also, as far as I can tell, the latter form (with separate chars), could be much faster on some platforms.
What is the upshot? What are the rules that should guide me in choosing between the two types of output operations?
For reference, here is an earlier attempt of mine at posing the same question (3 years ago): C++ What is the role of std::ctype<char>::widen()?
Since the old question never got much traction, I'd prefer to mark that one as a duplicate of this one, rather than vice versa.
EDIT: I recognize that a good output operator might not want to use formatted output operations internally, but that is not what I am interested in here. I'm interested in the reasoning behind the difference in behavior of the two types of output operations.
EDIT: Here is one explanation that would make sense to me: << on single char is to be understood as a special case of << on std::string, and not as a special case of << on const char*. But, is this the right explanation? If so, I believe it means that I should use << "->" above. Not << '-' << '>'.
EDIT: Here is what makes me think that the explanation above (2nd EDIT) is not the right one: In the case of a wchar_t stream, both << on char and << on const char* invokes widen(), so from this point of view, they are in the same "family". So, from a consistency point of view, we should expect that when we switch stream type from wchar_t to char, either both of those operators should still invoke widen(), or both should not.
EDIT: Here is another kind of explanation, which I don't think is right, but I'll include it for exposition: For a char stream out, out << "->" has the same effect as out << '-' << '>', because even though the first form is required to invoke widen(), widen() is required to be a "no op" on a char stream in any locale (I don't believe this is the case). So, while there may be a significant difference in performance, the results are always the same. This would suggest that the difference in formulation of required behavior is a kind of unintended, but fairly benign accident. If this is the right explanation, then I should chose out << '-' << '>' due to the possibly much better performance.
EDIT: Ok, I found another 3 year old question from myself, where I am coming at it from a slightly different angle: C++ When are characters widened in output stream operator<<()?. The comments from Dietmar Kühl suggests that widen() is always a "no op" on a char stream, and the whole "issue" is due to imprecise wording in the standard. If so, it would render my second proposed explanation above correct (4th EDIT). Still, It would be nice to get this corroborated by somebody else.

Refactoring C-style pretty-printing into C++-style pretty-printing

I want to refactor some printf/sprintf/fprintf statements into ostream/sstream/fstream statements. The code in question pretty-prints a series of integers and floating-point numbers, using whitespace padding and fixed numbers of decimal points.
It seems to me that this would be a good candidate for a Martin Fowler style writeup of a safe, step-by-step refactorings, with important gotchas noted. The first step, of course, is to get the legacy code into a test harness, which I have done.
What slow and careful steps can I take to perform this refactoring?
If refactoring is not the goal in itself, you can avoid it altogether (well, almost) by using a formatting library such as tinyformat which provides an interface similar to printf but is type safe and uses IOStreams internally.
Basic mechanics of the conversion:
Convert each printf-style clause %w.pf or %w.pe, where w is the field width and p is the number of digits of precision, into << setw(w) << setprecision(p) << fixed.
Convert each printf-style clause %wd or %wi, where w is the field width, into << setw(w).
Convert "\n" to endl where appropriate.
Process for printf:
Create a char[] (let's call it text) with enough total width.
Convert the printf(...) to sprintf(text, ...), and use cout << text to actually print the text.
Complete using the common instructions.
Process for fprintf:
Same as printf, but use the appropriate fstream instead of cout.
If you already have an opened C-style FILE object that you do not want to refactor at this time, it gets a little sticky (but can be done).
Complete using the common instructions.
Process for sprintf:
If the string being written to is only used to output to a stream in the current context, refer to one of the two refactorings above.
Otherwise, begin by creating a stringstream and streaming the contents of the char[] you are writing to into that. If you are still intending to extract a char* from it, you can do std::stringstream::str().c_str().
Complete using the common instructions.
Common instructions:
Convert each clause one by one into C++-style.
Remove *printf and char[] declarations as necessary when finished.
Apply other refactorings, particularly "Extract Method" (Fowler, Refactoring) as necessary.

formatted output of a number and field width, where does the C++ standard say about it?

This code snippet:
//
// This is example code from Chapter 11.2.5 "Fields" of
// "Programming -- Principles and Practice Using C++" by Bjarne Stroustrup
//
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
cout << 123456 // no field used
<<'|'<< setw(4) << 123456 << '|' // 123456 doesn't fit in a 4 char field
<< setw(8) << 123456 << '|' // set field width to 8
<< 123456 << "|\n"; // field sizes don't stick
}
produces this output:
123456|123456| 123456|123456|
The second print of 123456 is not truncated to fit in a field with width of 4 and Stroustrup explains that it is the right thing to do because a bad looking table with right numbers is better than a good looking table with wrong numbers.
where does the C++ standard say about this behaviour?
I found ios_base::width where the standard says:
The minimum field width (number of characters) to generate on certain
output conversions
Is "minimum" the keyword here to explain the said behaviour?
The statement you cite is a generic description. Regardless of what is
being output, the field will have at least that many characters; that is
the meaning of minimum. The exact meaning of the field depends on the
type of data being output. In the case of integer output, the exact
format is specified in §22.4.2.2; this includes not only how the width
field is interpreted, and a guarantee that the field will not be larger
unless necessary to display the value according to the format specified,
but also what character to use for the fill, and where to put it.
(Stroustrup's example leaves all of the other parameters with their
default values, but if you have a negative number, and specified a fill
character of '0', you wouldn't want it to result in |000-1234|, but
rather |-0001234|.)
For user defined types, it's entirely possible that the field contain
less than the minimum. I would consider this a bug, but I imagine a lot
of user defined << are written without consideration of any of the
formatting parameters. The actual effect of std::setw is only to set
a field in the std::basic_ios<char> class; it's up to the
implementation of << to handle it correctly.

What's the deal with setw()?

I recently was bitten by the fact that ios_base::width and/or the setw manipulator have to be reset with every item written to the stream.
That is, you must do this:
while(whatever)
{
mystream << std::setw(2) << myval;
}
Rather than this:
mystream.width(2);
while(whatever)
{
mystream << myval;
}
Ok, fine.
But does anyone know why this design decision was made?
Is there some rationale that I'm missing, or is this just a dark corner of the standard?
Other stream formatting modifiers (as mentioned in the linked SO question) are 'sticky', while setw is not.
The decisions of which manipulators should affect only the next operation seem to be based on logical and empirical observations about what tends to factor common functional needs better, and hence be easier for the programmer to write and get right.
The following points strike me as relevant:
some_stream << x should just work right most of the time
most code that sets the width will immediately or very shortly afterwards stream the value, so unrelated code can assume there won't be some "pending" width value affecting its output
setfill() is not relevant unless there's a pending setw(), so won't adversely affect the some_stream << x statement topping our list
only when width is being explicitly set, the programmer can/must consider whether the fill character state is appropriate too, based on their knowledge of the larger calling context
it's very common for a set of values to use the same fill character
other manipulators like hex and oct are persistent, but their use is typically in a block of code that either pops the prior state or (nasty but easier) sets it back to decimal
The point leading from this that answers your question...
if setw() were presistent, it would need to be reset between each streaming statement to prevent unwanted fill...
The way i see it is : You can always do something like below if you want it to be applied uniformly.
int width =2;
while(whatever)
{
mystream << std::setw(width) << myval;
}
but if it was sticky as you mention:
mystream.width(2);
while(whatever)
{
mystream << myval;
}
and if i wanted a different width every line I have to keep setting width.
So essentially both approaches are almost the same, and i would like or dislike them depending on what i am doing now.