Why is there no overload for printing `std::byte`?

Why is there no overload for printing `std::byte`? - c++

The following code does not compile in C++20
#include <iostream>
#include <cstddef>
int main(){
std::byte b {65};
std::cout<<"byte: "<<b<<'\n';// Missing overload
}
When std::byte was added in C++17, why was there no corresponding operator<< overloading for printing it? I can maybe understand the choice of not printing containers, but why not std::byte? It tries to act as primitive type and we even have overloads for std::string, the recent std::string_view, and perhaps the most related std::complex, and std::bitset itself can be printed.
There are also std::hex and similar modifiers, so printing 0-255 by default should not be an issue.
Was this just oversight? What about operator>>, std::bitset has it and it is not trivial at all.
EDIT: Found out even std::bitset can be printed.

From the paper on std::byte (P0298R3): (emphasis mine)
Design Decisions
std::byte is not an integer and not a character
The key motivation here is to make byte a distinct type – to improve program safety by leveraging the type system. This leads to the design that std::byte is not an integer type, nor a character type. It is a distinct
type for accessing the bits that ultimately make up object storage.
As such, it is not required to be implicitly convertible/interpreted to be either a char or any integral type whatsoever and hence cannot be printed using std::cout unless explicitly cast to the required type.
Furthermore, this question might help.

std::byte is intended for accessing raw data. To allow me to replace that damn uint8_t sprinkled all over the codebase with something that actually says "this is raw and unparsed", instead of something that could be misunderstood as a C string.
To underline: std::byte doesn't "try to be a primitive", it represents something even less - raw data.
That it's implemented like this is mostly a quirk of C++ and compiler implementations (layout rules for "primitive" types are much simpler than for a struct or a class).
This kind of thing is mostly found in low level code where, honestly, printing shouldn't be used. Isn't possible sometimes.
My use case, for example, is receiving raw bytes over I2C (or RS485) and parsing them into frame which is then put into a struct. Why would I want to serialize raw bytes over actual data? Data I will have access to almost immediately?
To sum up this somewhat ranty answer, providing operator overloads for std::byte to work with iostream goes against the intent of this type.
And expressing intent in code as much as possible is one of important principles in modern programming.

Related

Is static_cast<int>(myUnsignedLongVar) implementation defined?

Common advice when writing C++ states that implicit conversions should be avoided in favour of explicit casts.
int x = myUnsignedLongVar; // should be avoided
while
int x = static_cast<int>(myUnsignedLongVar); // is preferred
Does a static_cast make this conversion any safer? As far as I'm aware both conversions are implementation defined?
Is the static_cast just a signal to indicate more verbosely that this operation is potentially implementation defined?
And who's implementation? Is the operation dependent on the implementation of the compiler or the CPU?

Both examples would produce the same code.
But, in the second one everyone knows that there’s a cast to int.
In the first one could assume myUnsignedLongVar is int.
In order to make sure nobody misses the cast, guidelines and compilers recommend making it explicit.

I believe your example is somewhat narrow to show the real difference between different types of casting.
If you are simply casting from unsigned int to int, or from double to int, this may not show its real value.
The real value comes when you do not want to allow casting to cause bugs in your program. For example when performing comparisons between signed and unsigned types, or pointer or changing object types. Moreover, C++ style casting i.e. static cast is checked by the compiler.
Another benefit is what you already mentioned. Verbosity. They ensure that the authors intent is captured.
Several answers to this question contain a nice summary and comparison between different types of casting.

What is the rationale for the C++ standard not allowing reinterpret_cast<int>( aFloatValue )?

Or equivalently
reinterpret_cast<int64>( aDoubleValue )
Allowing such a reinterpret_cast would not introduce any aliasing issues.
I understand that using memcpy to replicate bytes from a variable of one type into a variable of another type can achieve the necessary effect. And I understand that mature compilers with powerful optimizers can do wonderful things with such memcpy constructs.
That said I happen to be a compiler-writer. I am very aware of the many forms of optimization that must exist and the many preconditions that have to be met in order to pull off those dramatic optimizations. Do we really want to assume that every compiler is that powerful or that every programmer will write code successfully satisfying all of the necessary preconditions? Can we really fault programmers using non-mainstream compilers for being dubious that such a hoped-for optimization will occur?
The need to inspect the bits of a floating point values arises fairly often. Do we really want to insist that every programmer encountering such a need traffic in address-taken variables, char* exceptions to type aliasing rules, memcpy magic, etc?

The "new" (now pretty old) forms of casting provided in C++ are all basically subsets of things you can do with C-style casts ("basically" because they have a few minor additions like dealing with references, that just don't exist in C).
To get the equivalent of a reinterpret_cast with a C-style cast, you do something like:
int a = *(int *)&some_float;
So, with a reinterpret_cast, you do pretty much the same things, just changing the syntax for the cast itself with a reinterpret_cast instead:
int a = *reinterpret_cast<int *>&some_float;
As for defining what this does, so you can do it without violating any strict aliasing rules and such: I suspect the committee would probably be happy to consider a really well written proposal--I suspect the problem right now is that most think it would take a lot of work, and payoff would be minimal.
If the real intent is to support looking at the bits of a floating point number, it might be simpler to define something like a constructor for a bitset::bitset<float> or something on that order, to give a type that's already explicitly defined to give access to bits (though it would still probably be non-trivial to define how it's supposed to work in a way that's at all portable).

If all you want is a function that converts a float into its sequence of bits, memcpy can do that without running afoul of the strict aliasing rule:
unsigned int ConvertFloat(float f)
{
unsigned int ret;
std::memcpy(&ret, &f, sizeof(unsigned int));
return ret;
}
Live example.
You could even make a general template function for converting one type to another. One that would static_assert or SFINAE-check if the arguments are trivially-copyable.
In your line of work, converting floats to integers is a semi-frequent operation. But that's simply not true for most C++ users (or even most C users). So there is no widespread need for such a conversion operation.
Furthermore, the C++ standard doesn't require an implementation to provide any particular representation of a float. So even if the standard provided such a function, it couldn't tell you what it actually returned. There's not even guarantee that float can fit into an unsigned int.

Why is the `std::sto`... series not a template?

I wonder if there is a reason why the std::sto series (e.g. std::stoi, std::stol) is not a function template, like that:
template<typename T>
T sto(std::string const & str, std::size_t *pos = 0, int base = 10);
and then:
template<>
int sto<int>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
template<>
long sto<long>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
/* etc. */
In my sense, that would be a better design, because for the moment, when I have to convert a string in whatever numerical value an user want, I have to manually manage each case.
Is there a reason to not have such a template function? Is there an assumed choice, or is this just done like that?

Looking at the description of these functions at cppref, I note the following:
... Interprets a signed integer value in the string str.
1) calls std::strtol(str.c_str(), &ptr, base)...
and strol a "C" standard function that's also available in C++.
Reading further, we see: (for the c++ sto* functions):
Return value
The string converted to the specified signed integer type.
Exceptions
std::invalid_argument if no conversion could be performed
std::out_of_range if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or
std::strtoll) sets errno to ERANGE.
So while I have no original source for this, and indeed have never worked with these functions, I would guess that:
TL;DR : These functions are C++-ish wrappers around already existing C/C++ functions -- strtol* -- so they resemble these functions as close as possible.

I have to manage manually each case. Is there a reason to not have such a template function?
In case of such questions, Eric Lippert (C#) usually says something along the lines:
If a feature is missing, then it's missing because noone implemented it yet. And that's because either noone else earlier wanted yet, or because it was considered not worth the effort, or because it couldn't have been finished before publishing the current release".
Here, I guess it's the "not worth" part, but I have neither asked the commitee about, nor managed to find any answer in old questions and faqs. I didn't spend much time searching though.
I say this because I suppose that most common of these functions' functionality (if not all of) is already contained in stream classes, like istringstream. Just like cin/etc, this one also has an all-having operator >>, overloaded for all base numeric types (and more).
Furthermore, the stream manipulators like std::hex (std::setbase) already solve the problem of passing various type-dependent configuration parameters to the actual conversion functions. No problems with mixed function signatures (like those mentioned by DavidHaim in his answer). Here's just a single operator>>.
So.. since if we have it in streams, if we already can read numbers/etc from strings with simple foo >> bar >> setbase(42) >> baz >> ..., then I think it was not worth the effort to add more complicated layers to old C runtime functions.
No proof for that though. Just a hunch.

The problem with template specialization is that the specialization requires you to match the original template function signature, so each specialization must implement the interface of (string,pos,base).
If you would like to have some other type which does not follows this interface, you are in trouble.
Suppose that, in the future, we would like to have sto<std::pair<int,int>>. We will want to have pos and base for the first and the second stringified integer. we would like the signature to be in the form of string,pos1,base1,pos2,base2. Since sto signature is already set, we cannot do it.
You can always wrap std::sto* in your implementation of sto for integral types, but you cannot do that the other way around.

The purpose of these functions is to provide simple conversions for common cases. They are not intended as a general-purpose conversion suite. std::ostringstream is much better for that kind of thing.

In my sense, there would be a better design, because for the moment,
when I have to convert a string in whatever numerical value an user
want, I have to manage manually each case.
No, it would not. Templates goal (deliberately setting T-MP apart) is not to replace overloading; you should always prefer overloading to templates. Actually, it's something the language already does for you! Between a candidate function and a possible template instantation, the former will be prefered. Using language features for the sake of it is bad.
I don't see how templates could help either. Whatever type the user decides to input, it won't be known till runtime, and template types are deduced at compile time. C++ is a statically typed language. In this case, templates will just add an unneeded layer of complexity over normal function overloading.

What is std::mbstate_t?

I'm creating a custom locale by deriving from std::codecvt. Most of the methods I'm supposed to implement are pretty straight forward, except for this std::mbstate_t. On my compiler, vs2010, it's declared as an int. But, google tells me it's a POD type, it's sometimes a union (of what I don't know) or a struct (again I can't find it).
As I understand it, std::mbstate_t is a placeholder for partial convertions. And, I think, it comes into play when std::codecvt::on_out() requires more space to write the output, which in turn will call std::codecvt::do_unshift(). Please correct me if my assumptions are wrong.
I've read another post about storing pointers, though the post doesn't have an adequate answer. I've also read this example which presumes it to be a 32bit type although the standard states an int to be no less than 16bits.
My question. What can I safely store in std::mbstate_t? Can I safely replace it with another type? The answer to the above post suggests replacing it, but the following comment says otherwise.

I think that /the/ book concerning these things is C++ IOStreams and Locales by Langer and Kreft, if you seriously want to mess with these things, try to get hold of a copy. Now, coming back to your question, the mbstate_t is used to hold the state of the conversion. Normally, you would store this inside the conversion facet, but since the facets are immutable, you need to store it externally. In practice, that is used when you need more than a sequence of bytes to determine the according character, the Linux manpage of mbsinit() gives ISO-2022 and UTF-7 as examples for such encodings. Note that this does not affect UTF-8, where a single Unicode codepoint is always encoded by a sequence of bytes and without anything before or after that affecting the results. Partial UTF-8 sequences are also not handled by that, do_in() returns partial instead.
Now, what can you store in the mbstate_t? Since the actual type is undefined and the number of functions to manipulate it are very limited, there is nothing you can do with it at first. However, nothing else does anything with that state either, so you can do some ugly hacking on it. This might require a few #ifdef depending on the standard library but then you can simply (ab)use the fact that it's a POD (ints and unions are also PODs) to store pretty much any type of POD that is not larger. This won't win you a beauty price and the code won't work on any system automatically, but I think in this case it's unavoidable and the work for porting is also limited.
Finally, can you replace it? This type is part of std::char_traits which in turn affect really all strings and streams, so you need to replace them throughout your program or convert. Further, if you now create a new char_traits class, you still can't easily instantiate e.g. basic_string with it, because there is no guarantee that a general basic_string template even exists, it is only required that the two specializations for char and wchar_t (and some more for C++11) exist. Ditto for streams. In short, no you can't replace mbstate_t.

C++11 and [17.5.2.1.3] Bitmask Types

The Standard allows one to choose between an integer type, an enum, and a std::bitset.
Why would a library implementor use one over the other given these choices?
Case in point, llvm's libcxx appears to use a combination of (at least) two of these implementation options:
ctype_base::mask is implemented using an integer type:
<__locale>
regex_constants::syntax_option_type is implemented using an enum + overloaded operators:
<regex>
The gcc project's libstdc++ uses all three:
ios_base::fmtflags is implemented using an enum + overloaded operators: <bits/ios_base.h>
regex_constants::syntax_option_type is implemented using an integer type,
regex_constants::match_flag_type is implemented using a std::bitset
Both: <bits/regex_constants.h>
AFAIK, gdb cannot "detect" the bitfieldness of any of these three choices so there would not be a difference wrt enhanced debugging.
The enum solution and integer type solution should always use the same space. std::bitset does not seem to make the guarantee that sizeof(std::bitset<32>) == std::uint32_t so I don't see what is particularly appealing about std::bitset.
The enum solution seems slightly less type safe because the combinations of the masks does not generate an enumerator.
Strictly speaking, the aforementioned is with respect to n3376 and not FDIS (as I do not have access to FDIS).
Any available enlightenment in this area would be appreciated.

The really surprising thing is that the standard restricts it to just three alternatives. Why shouldn't a class type be acceptable? Anyway…
Integral types are the simplest alternative, but they lack type safety. Very old legacy code will tend to use these as they are also the oldest.
Enumeration types are safe but cumbersome, and until C++11 they tended to be fixed to the size and range of int.
std::bitset may be have somewhat more type safety in that bitset<5> and bitset<6> are different types, and addition is disallowed, but otherwise is unsafe much like an integral type. This wouldn't be an issue if they had allowed types derived from std::bitset<N>.
Clearly enums are the ideal alternative, but experience has proven that the type safety is really unnecessary. So they threw implementers a bone and allowed them to take easier routes. The short answer, then, is that laziness leads implementers to choose int or bitset.
It is a little odd that types derived from bitset aren't allowed, but really that's a minor thing.
The main specification that clause provides is the set of operations defined over these types (i.e., the bitwise operators).

My preference is to use an enum, but there are sometimes valid reasons to use an integer. Usually ctype_base::mask interacts with the native OS headers, with a mapping from ctype_base::mask to the <ctype.h> implementation-defined constants such as _CTYPE_L and _CTYPE_U used for isupper and islower etc. Using an integer might make it easier to use ctype_base::mask directly with native OS APIs.
I don't know why libstdc++'s <regex> uses a std::bitset. When that code was committed I made a mental note to replace the integer types with an enumeration at some point, but <regex> is not a priority for me to work on.

Why would the standard allow different ways of implementing the library? And the answer is: Why not?
As you have seen, all three options are obviously used in some implementations. The standard doesn't want to make existing implementations non-conforming, if that can be avoided.
One reason to use a bitset could be that its size fits better than an enum or an integer. Not all systems even have a std::uint32_t. Maybe a bitset<24> will work better there?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js