I recently was bitten by the fact that ios_base::width and/or the setw manipulator have to be reset with every item written to the stream.
That is, you must do this:
while(whatever)
{
mystream << std::setw(2) << myval;
}
Rather than this:
mystream.width(2);
while(whatever)
{
mystream << myval;
}
Ok, fine.
But does anyone know why this design decision was made?
Is there some rationale that I'm missing, or is this just a dark corner of the standard?
Other stream formatting modifiers (as mentioned in the linked SO question) are 'sticky', while setw is not.
The decisions of which manipulators should affect only the next operation seem to be based on logical and empirical observations about what tends to factor common functional needs better, and hence be easier for the programmer to write and get right.
The following points strike me as relevant:
some_stream << x should just work right most of the time
most code that sets the width will immediately or very shortly afterwards stream the value, so unrelated code can assume there won't be some "pending" width value affecting its output
setfill() is not relevant unless there's a pending setw(), so won't adversely affect the some_stream << x statement topping our list
only when width is being explicitly set, the programmer can/must consider whether the fill character state is appropriate too, based on their knowledge of the larger calling context
it's very common for a set of values to use the same fill character
other manipulators like hex and oct are persistent, but their use is typically in a block of code that either pops the prior state or (nasty but easier) sets it back to decimal
The point leading from this that answers your question...
if setw() were presistent, it would need to be reset between each streaming statement to prevent unwanted fill...
The way i see it is : You can always do something like below if you want it to be applied uniformly.
int width =2;
while(whatever)
{
mystream << std::setw(width) << myval;
}
but if it was sticky as you mention:
mystream.width(2);
while(whatever)
{
mystream << myval;
}
and if i wanted a different width every line I have to keep setting width.
So essentially both approaches are almost the same, and i would like or dislike them depending on what i am doing now.
Related
When writing single characters to an output stream, the purist in me wants to use single quotes (e.g.):
unsigned int age{40};
std::ostringstream oss;
oss << "In 2022, I am " << age << '\n'; // 1. Single quotes around \n
oss << "In 2023, I will be " << age + 1u << "\n"; // 2. Minor ick--double quotes around \n
Because I'm writing a single character and not an arbitrary-length message, it doesn't seem necessary to have to provide a null-terminated string literal.
So I decided to measure the difference in speed. Naively, I'd expect option 1, the single-character version, to be faster (only one char, no need to handle \0). However, my test with Clang 13 on quick-bench indicates that option 2 is a hair faster. Is there an obvious reason for this?
https://quick-bench.com/q/3Zcp62Yfw_LMbh608cwHeCc0Nd4
Of course, if the program is spending a lot of time writing data to a stream anyway, chances are the program needs to be rethought. But I'd like to have a reasonably correct mental model, and because the opposite happened wrt what I expected, my model needs to be revised.
As you can see in the assembly and in the libc++ source here, both << operations in the end call the same function __put_character_sequence which the compiler decided to not inline in either case.
So, in the end you are passing a pointer to the single char object anyway and if there is a pointer indirection overhead it applies equally to both cases.
__put_character_sequence also takes the length of the string as argument, which the compiler can easily evaluate at compile-time for "\n" as well. So there is no benefit there any way either.
In the end it probably comes down to the compiler having to store the single character on the stack since without inlining it can't tell whether __put_character_sequence will modify it. (The string literal cannot be modified by the function and also would have the same identity between iterations of the loop.)
If the standard library used a different approach or the compiler did inline slightly differently, the result could easily be the other way around.
Apparently, writing a single character of type char to a stream whose char type is char is guaranteed by the standard to not invoke ctype<char>.widen() on the associated locale.
On the other hand, according to my reading of the standard (C++17), when writing a string of chars (const char*) instead of a single char, ctype<char>.widen() must be invoked.
I am struggling to understand how to make sense of this.
On one hand, the fact, that widen() is required when writing strings, suggests that there are valid scenarios where widen() has an effect. But if that is the case, then how can it be alright to omit the widening operation when writing single characters?
It seems to me that there must be an intended difference in the roles (domains of applicability) of the two operations, output of single char (char) and output of string (const char*), but I do not see what it is.
To make things more concrete, let us say that I wanted to implement an output operator for a range object, and have the output be on the form 0->2. My first inkling would be something like this:
std::ostream& operator<<(std::ostream& out, const Range& range)
{
// ...
out << "->"; // Invokes widen()
// ...
}
But, is this how I am supposed to do it? Or would out << '-' << '>' (no widening) have been better / more correct?
Curiously, the formulation of the standard suggests to me that the two forms do not always produce the same result. Also, as far as I can tell, the latter form (with separate chars), could be much faster on some platforms.
What is the upshot? What are the rules that should guide me in choosing between the two types of output operations?
For reference, here is an earlier attempt of mine at posing the same question (3 years ago): C++ What is the role of std::ctype<char>::widen()?
Since the old question never got much traction, I'd prefer to mark that one as a duplicate of this one, rather than vice versa.
EDIT: I recognize that a good output operator might not want to use formatted output operations internally, but that is not what I am interested in here. I'm interested in the reasoning behind the difference in behavior of the two types of output operations.
EDIT: Here is one explanation that would make sense to me: << on single char is to be understood as a special case of << on std::string, and not as a special case of << on const char*. But, is this the right explanation? If so, I believe it means that I should use << "->" above. Not << '-' << '>'.
EDIT: Here is what makes me think that the explanation above (2nd EDIT) is not the right one: In the case of a wchar_t stream, both << on char and << on const char* invokes widen(), so from this point of view, they are in the same "family". So, from a consistency point of view, we should expect that when we switch stream type from wchar_t to char, either both of those operators should still invoke widen(), or both should not.
EDIT: Here is another kind of explanation, which I don't think is right, but I'll include it for exposition: For a char stream out, out << "->" has the same effect as out << '-' << '>', because even though the first form is required to invoke widen(), widen() is required to be a "no op" on a char stream in any locale (I don't believe this is the case). So, while there may be a significant difference in performance, the results are always the same. This would suggest that the difference in formulation of required behavior is a kind of unintended, but fairly benign accident. If this is the right explanation, then I should chose out << '-' << '>' due to the possibly much better performance.
EDIT: Ok, I found another 3 year old question from myself, where I am coming at it from a slightly different angle: C++ When are characters widened in output stream operator<<()?. The comments from Dietmar Kühl suggests that widen() is always a "no op" on a char stream, and the whole "issue" is due to imprecise wording in the standard. If so, it would render my second proposed explanation above correct (4th EDIT). Still, It would be nice to get this corroborated by somebody else.
std::istream & Date::read(std::istream & istr)
{
char* buffer = nullptr;
const bool ISTREAM_IS_OKAY = !(istr.fail());//okay if it didn't fail
if (ISTREAM_IS_OKAY)
{
cout << "Enter a string: ";
const int SIZE = 256;
buffer = new char[SIZE];
istr.getline(buffer, SIZE);
cout << "\n" << buffer << " " << strlen(buffer) << endl;
istr.getline(buffer, SIZE, '/');
cout << "\n" << buffer << " " << strlen(buffer) << endl;
istr.getline(buffer, SIZE, '/');
cout << "\n" << buffer << " " << strlen(buffer) << endl;
}
else
{//CIN_FAILED is a pre-processor directive which is equal to 1
m_readErrorCode = CIN_FAILED; //m_readErrorCode is just an int
}
delete[] buffer;
return istr;
}
I am trying to read in a date in one string using cin.getline(). Dependent upon whether the boolean member variable m_dateOnly is true or false, the date is to be printed in one of the following two fashions:
1) if(m_dateOnly==true)....
2017/3/18
2) else...print the date and time
2017/3/18 , 12:36
I'm aware that the logic in my code does not entirely dictate what I just explained(It's still a WIP). I came to a halt because when I enter the following:
"abcd" ... no delimiter here
cin.getline() continues to run until the user enters a string with the given delimiter in it.
How can I get cin.getline() to stop on the first instance of an invalid string as opposed to it continuously running?
Note: I am required to use the istream passed as an argument
Thanks in advance!
Basically you can't, because getline will not stop until it encounters the terminator it expects, the buffer gets full or the input ends.
At any rate, you can't pass it a list of 2 characters or more (the expected terminator and/or some illegal characters) it should stop on.
If you really want your code to react on a character per character basis, you will need to use character by character input, with methods like sgetc or sbumpc.
I would not advise to do so, because that would force you to handle all the pesky edge cases like your input buffer getting full of the input being terminated, which getline can handle without headache.
You could also use the >> operator to grab bits of characters or numbers according to whatever format is expected for your date and time. Trouble is, that would force you to check the state of your input stream after each >> invokation, making for ponderous and nigh unreadable code.
Another possibility is to use scanf like functions, but they have the slight downside of including an undefined behaviour on numeric inputs, meaning typing a large number of digits when it expects a number could theoretically lead to a program crash, a random memory corruption or your mustache turning pink.
Yet another possibility is to piss a couple dozen lines of code to create your own homemade list of separators through the imbue method and a custom ctype object. I would not touch that with a 10 feet pole, but I'm sure a lot of senior developpers pull that trick to impress the chicks...
Now if you ask me, C++ string I/O is an appallingly awkward leftover from the 90's: no regular expressions, no garbage collection, no associative memory, so you will end up checking the characters you just read, monitoring the state of your I/O stream and allocating bits of buffers every second line of code. You're bound to suffer one way or another. I would just not make it more painful than it has to be, if I were you.
The usual way of circumventing the crappy C++ I/O is to read a plain line (terminated by a good old \n, usually what you get when you hit the enter key), and then analyze the resulting string buffer by hand. Once you're done with reading an actual input, you don't have to worry about buffers overflowing or input terminating at an awkward moment. That usually makes things a lot less messy.
btw. my personal preference goes to never having to call delete on a null pointer. You can do it, but that makes for pretty dangerous code that tends to break if you modify it one time too many. It could arguably save you a few minutes of coding, but might also cost you (or one of your infortunate coworkers) a few hours of debugging a few weeks/months later.
If your buffer is only used within a code block, better make it a local variable that will be cleaned up automatically. Use dynamic allocation only when you really need it.
No doubt a lot of C++ zealots will be eager to explain the contrary, but this bit of wisdom comes from long nights spent munching pizzas in front of buggy code, often written by people who were just a bit too smart for their own good (and the good of their coworkers, incidentally). Make what you want of it, it comes free of charge.
It seems an insertion causes a stream to reset its width to 0. In other words, you need to repeatedly call the width() whenever you want to align your output, which makes me wonder why. I've looked over the c++11 standard and I found a partial answer, which suggests that width(0) is called inside character inserters according to the ISO C++ standard §27.7.3.6.4.1. However, they are just a subset of the inserters, which is not a complete answer. Then, what about arithmetic inserters? I couldn't find any other references that can explain the behavior of streams.
Because that's the way the streams were specified. In an << operator,
width should be reset, the other formatting options no. I can only
guess as to why: presumably, for things like:
std::cout << std::setw(8) << someValue << " cm";
You wouldn't want the width to apply to the string " cm", for example.
But you would want to be able to write:
std::cout << std::setw(8) << price << " | " << setw(20) << article;
where article is a string.
Except that, of course, for that to work, you'd have to also change the
justification before the string, and the change in justification would
be sticky, affecting the next numeric output.
In practice, of course, experienced programmers don't write this sort of
code. They'd use something like:
std::cout << price(column1Width) << article.price
<< " | " << label(column2Width) << article.name;
(supposing that they still had to generate tables using a fixed width
font). Where price and label were manipulators which set any number
of format flags, and restored them in their destructor. (Since they are
temporaries, their destructor will be called at the end of the full
expression.) This way, this particular line of code doesn't say
anything about the physical formatting, but rather that the first value
should be formatted as a price, and the second as a label. (And of
course, if someone higher up later decides that prices or labels should
be formatted differently, you just change the manipulators, rather than
searching all of the output statements, and trying to figure out which
are prices, and which aren't.)
EDIT (added references to the standard):
It's important to note that the standard cannot cover everything here,
since most of the time, you'll be using custom operator<<, written by
the author of the class you're outputting. Most of the built-in
operator<< are covered by §22.4.2.2.2, in its description of stage 3:
If str.width() is nonzero and the number of charT’s in the sequence
after stage 2 is less than str.width(), then enough fill characters are
added to the sequence at the position indicated for padding to bring the
length of the sequence to str.width().
str.width(0) is called.
For characters and C style strings, this is specified (in much the same
way) in §27.7.3.6.4. For std::string, see §21.4.8.8.
For std::complex: the standard defines insertion in terms of other
insertion operators. Any setting of width will thus affect only the
real element. (Practically speaking, I think we can consider this
broken. When I implemented a pre-standard complex class, my <<
checked the width, and if it was non-zero, subtracted 3 for the
non-numeric fields, then divided by 2 and set it before outputting each
double. I"m not sure that this is right, but it's certainly better than
what the standard specifies. And I also used a semi-colon as separator
if the decimal was a comma, as it is in most places I've lived.)
The fact that the other formatting options remain unchanged is because
the standard doesn't specify anything, and additional side effects,
other than those specified, are assumed to be forbidden.
In all cases, the specification says that stream.width(0) is called.
But in the standard, each case is specified separately. There is no
specification (even for intent) of any general rule that I can find.
Traditionally, however, the rule has always been: reset width, and leave
the others unchanged. The principle of least surprise says that you
should do this for user defined << (and >>) as well.
I am pretty sure all of you are familiar with the concept of the Big4, and I have several stuffs to do print in each of the constructor, assignment, destructor, and copy constructor.
The restriction is this:
I CAN'T use more than one newline (e.g., ƒn or std::endl) in any method
I can have a method called print, so I am guessing print is where I will put that precious one and only '\n', my problem is that how can the method print which prints different things on each of the element I want to print in each of the Big4? Any idea? Maybe overloading the Big4?
Maybe I don't understand the question completely because it is asked rather awkwardly, but can't you just have a function called newline that receives an ostream as an argument, and then simply prints '/n' to that output stream? Then you can just call that infinitely many times, while still abiding the arbitrary "one newline" rule.
e.g.
(edit: code removed, "smells like homework")
print should take a parameter containing the information to output to the screen (sans '\n') and then call the c++ output method with in-line appending the '\n' to the passed in information.
note: no code 'cause this smells like homework to me...
I'm not sure I completely understand what you're trying to accomplish. Why is it that you can only use one newline? What makes it difficult to just write your code with only one newline in it? For example, I've done stuff like this before.
for(int i = 0; i < 10; i++) {
cout << i << " ";
}
cout << std::endl;
If you need something more complicated, you might want to make some sort of print tracker object that keeps a flag for whether a newline has been printed, and adjusts its behavior accordingly. This seems like it might be a little overly complicated though.