Writing single-char vs. char const* to buffer - c++

When writing single characters to an output stream, the purist in me wants to use single quotes (e.g.):
unsigned int age{40};
std::ostringstream oss;
oss << "In 2022, I am " << age << '\n'; // 1. Single quotes around \n
oss << "In 2023, I will be " << age + 1u << "\n"; // 2. Minor ick--double quotes around \n
Because I'm writing a single character and not an arbitrary-length message, it doesn't seem necessary to have to provide a null-terminated string literal.
So I decided to measure the difference in speed. Naively, I'd expect option 1, the single-character version, to be faster (only one char, no need to handle \0). However, my test with Clang 13 on quick-bench indicates that option 2 is a hair faster. Is there an obvious reason for this?
https://quick-bench.com/q/3Zcp62Yfw_LMbh608cwHeCc0Nd4
Of course, if the program is spending a lot of time writing data to a stream anyway, chances are the program needs to be rethought. But I'd like to have a reasonably correct mental model, and because the opposite happened wrt what I expected, my model needs to be revised.

As you can see in the assembly and in the libc++ source here, both << operations in the end call the same function __put_character_sequence which the compiler decided to not inline in either case.
So, in the end you are passing a pointer to the single char object anyway and if there is a pointer indirection overhead it applies equally to both cases.
__put_character_sequence also takes the length of the string as argument, which the compiler can easily evaluate at compile-time for "\n" as well. So there is no benefit there any way either.
In the end it probably comes down to the compiler having to store the single character on the stack since without inlining it can't tell whether __put_character_sequence will modify it. (The string literal cannot be modified by the function and also would have the same identity between iterations of the loop.)
If the standard library used a different approach or the compiler did inline slightly differently, the result could easily be the other way around.

Related

Would it be more efficient to have the \n character in the same string literal than to have an extra set of "<<"?

I was wondering if in code such as this:
std::cout << "Hello!" << '\n';
Would it be more efficient to have the \n character in the same string literal, as in this case, "Hello!"? Such as the code would then look like this:
std::cout << "Hello!\n";
I am aware that things such as these are so miniscule in difference whether one is more efficient than the other (especially in this case), that it just doesn't matter, but it has just been boggling my mind.
My reasoning:
If I am not mistaken having the \n character in the same string literal would be more efficient, since when you have the extra set of the insertion operator (operator<<) you have to call that function once again and whatever is in the implementation of that function, which in this case, in the Standard Library, will happen just for one single character (\n). In comparison to only having to do it once, that is, if I were to append that character to the end of the string literal and only have to use one call to operator<<.
I guess this is a very simple question but I am not 100% sure about the answer and I love to spend time knowing the details of a language and how little things work better than others.
Update:
I am working to find the answer myself as for questions such as this, it is better for one to try and find the answer for themselves, I haven't worked with any Assembly Code in the past, so this will be my firs time trying to.
Yes; but all of the ostreams are so inefficient in practice that if you care much about efficiency you shouldn't be using them.
Write code to be clear and maintainable. Making code faster takes work, and the simpler and clearer your code is the easier to make it faster.
Identify what your bottleneck is, then work on optimizing that by actually working out what is taking time. (This only fails when global behaviour caused global slowdowns, like fragmentation or messing with caches because).

C++ When should I std::ctype<char>::widen()?

Apparently, writing a single character of type char to a stream whose char type is char is guaranteed by the standard to not invoke ctype<char>.widen() on the associated locale.
On the other hand, according to my reading of the standard (C++17), when writing a string of chars (const char*) instead of a single char, ctype<char>.widen() must be invoked.
I am struggling to understand how to make sense of this.
On one hand, the fact, that widen() is required when writing strings, suggests that there are valid scenarios where widen() has an effect. But if that is the case, then how can it be alright to omit the widening operation when writing single characters?
It seems to me that there must be an intended difference in the roles (domains of applicability) of the two operations, output of single char (char) and output of string (const char*), but I do not see what it is.
To make things more concrete, let us say that I wanted to implement an output operator for a range object, and have the output be on the form 0->2. My first inkling would be something like this:
std::ostream& operator<<(std::ostream& out, const Range& range)
{
// ...
out << "->"; // Invokes widen()
// ...
}
But, is this how I am supposed to do it? Or would out << '-' << '>' (no widening) have been better / more correct?
Curiously, the formulation of the standard suggests to me that the two forms do not always produce the same result. Also, as far as I can tell, the latter form (with separate chars), could be much faster on some platforms.
What is the upshot? What are the rules that should guide me in choosing between the two types of output operations?
For reference, here is an earlier attempt of mine at posing the same question (3 years ago): C++ What is the role of std::ctype<char>::widen()?
Since the old question never got much traction, I'd prefer to mark that one as a duplicate of this one, rather than vice versa.
EDIT: I recognize that a good output operator might not want to use formatted output operations internally, but that is not what I am interested in here. I'm interested in the reasoning behind the difference in behavior of the two types of output operations.
EDIT: Here is one explanation that would make sense to me: << on single char is to be understood as a special case of << on std::string, and not as a special case of << on const char*. But, is this the right explanation? If so, I believe it means that I should use << "->" above. Not << '-' << '>'.
EDIT: Here is what makes me think that the explanation above (2nd EDIT) is not the right one: In the case of a wchar_t stream, both << on char and << on const char* invokes widen(), so from this point of view, they are in the same "family". So, from a consistency point of view, we should expect that when we switch stream type from wchar_t to char, either both of those operators should still invoke widen(), or both should not.
EDIT: Here is another kind of explanation, which I don't think is right, but I'll include it for exposition: For a char stream out, out << "->" has the same effect as out << '-' << '>', because even though the first form is required to invoke widen(), widen() is required to be a "no op" on a char stream in any locale (I don't believe this is the case). So, while there may be a significant difference in performance, the results are always the same. This would suggest that the difference in formulation of required behavior is a kind of unintended, but fairly benign accident. If this is the right explanation, then I should chose out << '-' << '>' due to the possibly much better performance.
EDIT: Ok, I found another 3 year old question from myself, where I am coming at it from a slightly different angle: C++ When are characters widened in output stream operator<<()?. The comments from Dietmar Kühl suggests that widen() is always a "no op" on a char stream, and the whole "issue" is due to imprecise wording in the standard. If so, it would render my second proposed explanation above correct (4th EDIT). Still, It would be nice to get this corroborated by somebody else.

How can I stop cin.getline() from causing the console to repeatedly get user input when the delimiter is not found?

std::istream & Date::read(std::istream & istr)
{
char* buffer = nullptr;
const bool ISTREAM_IS_OKAY = !(istr.fail());//okay if it didn't fail
if (ISTREAM_IS_OKAY)
{
cout << "Enter a string: ";
const int SIZE = 256;
buffer = new char[SIZE];
istr.getline(buffer, SIZE);
cout << "\n" << buffer << " " << strlen(buffer) << endl;
istr.getline(buffer, SIZE, '/');
cout << "\n" << buffer << " " << strlen(buffer) << endl;
istr.getline(buffer, SIZE, '/');
cout << "\n" << buffer << " " << strlen(buffer) << endl;
}
else
{//CIN_FAILED is a pre-processor directive which is equal to 1
m_readErrorCode = CIN_FAILED; //m_readErrorCode is just an int
}
delete[] buffer;
return istr;
}
I am trying to read in a date in one string using cin.getline(). Dependent upon whether the boolean member variable m_dateOnly is true or false, the date is to be printed in one of the following two fashions:
1) if(m_dateOnly==true)....
2017/3/18
2) else...print the date and time
2017/3/18 , 12:36
I'm aware that the logic in my code does not entirely dictate what I just explained(It's still a WIP). I came to a halt because when I enter the following:
"abcd" ... no delimiter here
cin.getline() continues to run until the user enters a string with the given delimiter in it.
How can I get cin.getline() to stop on the first instance of an invalid string as opposed to it continuously running?
Note: I am required to use the istream passed as an argument
Thanks in advance!
Basically you can't, because getline will not stop until it encounters the terminator it expects, the buffer gets full or the input ends.
At any rate, you can't pass it a list of 2 characters or more (the expected terminator and/or some illegal characters) it should stop on.
If you really want your code to react on a character per character basis, you will need to use character by character input, with methods like sgetc or sbumpc.
I would not advise to do so, because that would force you to handle all the pesky edge cases like your input buffer getting full of the input being terminated, which getline can handle without headache.
You could also use the >> operator to grab bits of characters or numbers according to whatever format is expected for your date and time. Trouble is, that would force you to check the state of your input stream after each >> invokation, making for ponderous and nigh unreadable code.
Another possibility is to use scanf like functions, but they have the slight downside of including an undefined behaviour on numeric inputs, meaning typing a large number of digits when it expects a number could theoretically lead to a program crash, a random memory corruption or your mustache turning pink.
Yet another possibility is to piss a couple dozen lines of code to create your own homemade list of separators through the imbue method and a custom ctype object. I would not touch that with a 10 feet pole, but I'm sure a lot of senior developpers pull that trick to impress the chicks...
Now if you ask me, C++ string I/O is an appallingly awkward leftover from the 90's: no regular expressions, no garbage collection, no associative memory, so you will end up checking the characters you just read, monitoring the state of your I/O stream and allocating bits of buffers every second line of code. You're bound to suffer one way or another. I would just not make it more painful than it has to be, if I were you.
The usual way of circumventing the crappy C++ I/O is to read a plain line (terminated by a good old \n, usually what you get when you hit the enter key), and then analyze the resulting string buffer by hand. Once you're done with reading an actual input, you don't have to worry about buffers overflowing or input terminating at an awkward moment. That usually makes things a lot less messy.
btw. my personal preference goes to never having to call delete on a null pointer. You can do it, but that makes for pretty dangerous code that tends to break if you modify it one time too many. It could arguably save you a few minutes of coding, but might also cost you (or one of your infortunate coworkers) a few hours of debugging a few weeks/months later.
If your buffer is only used within a code block, better make it a local variable that will be cleaned up automatically. Use dynamic allocation only when you really need it.
No doubt a lot of C++ zealots will be eager to explain the contrary, but this bit of wisdom comes from long nights spent munching pizzas in front of buggy code, often written by people who were just a bit too smart for their own good (and the good of their coworkers, incidentally). Make what you want of it, it comes free of charge.

Why an insertion in streams makes its width reset to 0?

It seems an insertion causes a stream to reset its width to 0. In other words, you need to repeatedly call the width() whenever you want to align your output, which makes me wonder why. I've looked over the c++11 standard and I found a partial answer, which suggests that width(0) is called inside character inserters according to the ISO C++ standard §27.7.3.6.4.1. However, they are just a subset of the inserters, which is not a complete answer. Then, what about arithmetic inserters? I couldn't find any other references that can explain the behavior of streams.
Because that's the way the streams were specified. In an << operator,
width should be reset, the other formatting options no. I can only
guess as to why: presumably, for things like:
std::cout << std::setw(8) << someValue << " cm";
You wouldn't want the width to apply to the string " cm", for example.
But you would want to be able to write:
std::cout << std::setw(8) << price << " | " << setw(20) << article;
where article is a string.
Except that, of course, for that to work, you'd have to also change the
justification before the string, and the change in justification would
be sticky, affecting the next numeric output.
In practice, of course, experienced programmers don't write this sort of
code. They'd use something like:
std::cout << price(column1Width) << article.price
<< " | " << label(column2Width) << article.name;
(supposing that they still had to generate tables using a fixed width
font). Where price and label were manipulators which set any number
of format flags, and restored them in their destructor. (Since they are
temporaries, their destructor will be called at the end of the full
expression.) This way, this particular line of code doesn't say
anything about the physical formatting, but rather that the first value
should be formatted as a price, and the second as a label. (And of
course, if someone higher up later decides that prices or labels should
be formatted differently, you just change the manipulators, rather than
searching all of the output statements, and trying to figure out which
are prices, and which aren't.)
EDIT (added references to the standard):
It's important to note that the standard cannot cover everything here,
since most of the time, you'll be using custom operator<<, written by
the author of the class you're outputting. Most of the built-in
operator<< are covered by §22.4.2.2.2, in its description of stage 3:
If str.width() is nonzero and the number of charT’s in the sequence
after stage 2 is less than str.width(), then enough fill characters are
added to the sequence at the position indicated for padding to bring the
length of the sequence to str.width().
str.width(0) is called.
For characters and C style strings, this is specified (in much the same
way) in §27.7.3.6.4. For std::string, see §21.4.8.8.
For std::complex: the standard defines insertion in terms of other
insertion operators. Any setting of width will thus affect only the
real element. (Practically speaking, I think we can consider this
broken. When I implemented a pre-standard complex class, my <<
checked the width, and if it was non-zero, subtracted 3 for the
non-numeric fields, then divided by 2 and set it before outputting each
double. I"m not sure that this is right, but it's certainly better than
what the standard specifies. And I also used a semi-colon as separator
if the decimal was a comma, as it is in most places I've lived.)
The fact that the other formatting options remain unchanged is because
the standard doesn't specify anything, and additional side effects,
other than those specified, are assumed to be forbidden.
In all cases, the specification says that stream.width(0) is called.
But in the standard, each case is specified separately. There is no
specification (even for intent) of any general rule that I can find.
Traditionally, however, the rule has always been: reset width, and leave
the others unchanged. The principle of least surprise says that you
should do this for user defined << (and >>) as well.

Refactoring C-style pretty-printing into C++-style pretty-printing

I want to refactor some printf/sprintf/fprintf statements into ostream/sstream/fstream statements. The code in question pretty-prints a series of integers and floating-point numbers, using whitespace padding and fixed numbers of decimal points.
It seems to me that this would be a good candidate for a Martin Fowler style writeup of a safe, step-by-step refactorings, with important gotchas noted. The first step, of course, is to get the legacy code into a test harness, which I have done.
What slow and careful steps can I take to perform this refactoring?
If refactoring is not the goal in itself, you can avoid it altogether (well, almost) by using a formatting library such as tinyformat which provides an interface similar to printf but is type safe and uses IOStreams internally.
Basic mechanics of the conversion:
Convert each printf-style clause %w.pf or %w.pe, where w is the field width and p is the number of digits of precision, into << setw(w) << setprecision(p) << fixed.
Convert each printf-style clause %wd or %wi, where w is the field width, into << setw(w).
Convert "\n" to endl where appropriate.
Process for printf:
Create a char[] (let's call it text) with enough total width.
Convert the printf(...) to sprintf(text, ...), and use cout << text to actually print the text.
Complete using the common instructions.
Process for fprintf:
Same as printf, but use the appropriate fstream instead of cout.
If you already have an opened C-style FILE object that you do not want to refactor at this time, it gets a little sticky (but can be done).
Complete using the common instructions.
Process for sprintf:
If the string being written to is only used to output to a stream in the current context, refer to one of the two refactorings above.
Otherwise, begin by creating a stringstream and streaming the contents of the char[] you are writing to into that. If you are still intending to extract a char* from it, you can do std::stringstream::str().c_str().
Complete using the common instructions.
Common instructions:
Convert each clause one by one into C++-style.
Remove *printf and char[] declarations as necessary when finished.
Apply other refactorings, particularly "Extract Method" (Fowler, Refactoring) as necessary.