Use cases of std::byte [duplicate] - c++

This question already has an answer here:
What is the purpose of std::byte?
(1 answer)
Closed 5 years ago.
The recent addition of std::byte to C++17 got me wondering why this type was even added to the standard at all. Even after reading the cppreference reference it's use cases don't seem clear to me.
The only use case I can come up with is that it more clearly expresses intent, as std::byte should only be treated as a collection of bits instead of a character type such as char which we used for both purposes before.
Meaning that:
this:
std::vector<std::byte> memory;
Is more clear than this:
std::vector<char> memory;
Is this the only use case and reason it was added to the standard or am I missing a big point here?

The only use case I can come up with is that it more clearly expresses intent
I think it was one of the reasons. This paper explains the motivation behind std::byte and compares its usage with the usage of char:
Motivation and Scope
Many programs require byte-oriented access to
memory. Today, such programs must use either the char, signed char, or
unsigned char types for this purpose. However, these types perform a
“triple duty”. Not only are they used for byte addressing, but also as
arithmetic types, and as character types. This multiplicity of roles
opens the door for programmer error – such as accidentally performing
arithmetic on memory that should be treated as a byte value – and
confusion for both programmers and tools. Having a distinct byte type
improves type-safety, by distinguishing byte-oriented access to memory
from accessing memory as a character or integral value. It improves
readability.
Having the type would also make the intent of code
clearer to readers (as well as tooling for understanding and
transforming programs). It increases type-safety by removing
ambiguities in expression of programmer’s intent, thereby increasing
the accuracy of analysis tools.
Another reason is that std::byte is restricted in terms of operations which can be performed on this type:
Like char and unsigned char, it can be used to access raw memory
occupied by other objects (object representation), but unlike those
types, it is not a character type and is not an arithmetic type. A
byte is only a collection of bits, and only bitwise logic operators
are defined for it.
which ensures an additional type safety as it is mentioned in the paper above.

Related

Why is std::ssize being forced to a minimum size for its signed size type?

In C++20, std::ssize is being introduced to obtain the signed size of a container for generic code. (And the reason for its addition is explained here.)
Somewhat peculiarly, the definition given there (combining with common_type and ptrdiff_t) has the effect of forcing the return value to be "either ptrdiff_t or the signed form of the container's size() return value, whichever is larger".
P1227R1 indirectly offers a justification for this ("it would be a disaster for std::ssize() to turn a size of 60,000 into a size of -5,536").
This seems to me like an odd way to try to "fix" that, however.
Containers which intentionally define a uint16_t size and are known to never exceed 32,767 elements will still be forced to use a larger type than required.
The same thing would occur for containers using a uint8_t size and 127 elements, respectively.
In desktop environments, you probably don't care; but this might be important for embedded or otherwise resource-constrained environments, especially if the resulting type is used for something more persistent than a stack variable.
Containers which use the default size_t size on 32-bit platforms but which nevertheless do contain between 2B and 4B items will hit exactly the same problem as above.
If there still exist platforms for which ptrdiff_t is smaller than 32 bits, they will hit the same problem as well.
Wouldn't it be better to just use the signed type as-is (without extending its size) and to assert that a conversion error has not occurred (eg. that the result is not negative)?
Am I missing something?
To expand on that last suggestion a bit (inspired by Nicol Bolas' answer): if it were implemented the way that I suggested, then this code would Just Work™:
void DoSomething(int16_t i, T const& item);
for (int16_t i = 0, len = std::ssize(rng); i < len; ++i)
{
DoSomething(i, rng[i]);
}
With the current implementation, however, this produces warnings and/or errors unless static_casts are explicitly added to narrow the result of ssize, or to use int i instead and then narrow it in the function call (and the range indexing), neither of which seem like an improvement.
Containers which intentionally define a uint16_t size and are known to never exceed 32,767 elements will still be forced to use a larger type than required.
It's not like the container is storing the size as this type. The conversion happens via accessing the value.
As for embedded systems, embedded systems programmers already know about C++'s propensity to increase the size of small types. So if they expect a type to be an int16_t, they're going to spell that out in the code, because otherwise C++ might just promote it to an int.
Furthermore, there is no standard way to ask about what size a range is "known to never exceed". decltype(size(range)) is something you can ask for; sized ranges are not required to provide a max_size function. Without such an ability, the safest assumption is that a range whose size type is uint16_t can assume any size within that range. So the signed size should be big enough to store that entire range as a signed value.
Your suggestion is basically that any ssize call is potentially unsafe, since half of any size range cannot be validly stored in the return type of ssize.
Containers which use the default size_t size on 32-bit platforms but which nevertheless do contain between 2B and 4B items will hit exactly the same problem as above.
Assuming that it is valid for ptrdiff_t to not be a signed 64-bit integer on such platforms, there isn't really a valid solution to that problem. So yes, there will be cases where ssize is potentially unsafe.
ssize currently is potentially unsafe in cases where it is not possible to be safe. Your proposal would make ssize potentially unsafe in all cases.
That's not an improvement.
And no, merely asserting/contract checking is not a viable solution. The point of ssize is to make for(int i = 0; i < std::ssize(rng); ++i) work without the compiler complaining about signed/unsigned mismatch. To get an assert because of a conversion failure that didn't need to happen (and BTW, cannot be corrected without using std::size, which we are trying to avoid), one which is ultimately irrelevant to your algorithm? That's a terrible idea.
if it were implemented the way that I suggested, then this code would Just Work™:
Let us ignore the question of how often it is that a user would write this code.
The reason your compiler will expect/require you to use a cast there is because you are asking for an inherently dangerous operation: you are potentially losing data. Your code only "Just Works™" if the current size fits into an int16_t; that makes the conversion statically dangerous. This is not something that should implicitly take place, so the compiler suggests/requires you to explicitly ask for it. And users looking at that code get a big, fat eyesore reminding them that a dangerous thing is being done.
That is all to the good.
See, if your suggested implementation were how ssize behaved, then that means we must treat every use of ssize as just as inherently dangerous as the compiler treats your attempted implicit conversion. But unlike static_cast, ssize is small and easily missed.
Dangerous operations should be called out as such. Since ssize is small and difficult to notice by design, it therefore should be as safe as possible. Ideally, it should be as safe as size, but failing that, it should be unsafe only to the extend that it is impossible to make it safe.
Users should not look on ssize usage as something dubious or disconcerting; they should not fear to use it.

Making unsigned integer underflow throw an exception

I understand that there are applications in which using unsigned integer over/underflow is a good way to get cheap modular arithmetic.
In my code, I use uint exclusively for indices to containers, so I never want this behaviour.
Is this a bad idea? Should I be using int everywhere instead? I do have to do some unsavoury things to get a for loop to count down to 0.
Is there a commonly used implementation of a less unsafe unsigned integer type? Something that throws an exception?
Do compilers (for me gcc, clang) provide a mechanism for less unsafe behaviour in the given compilation unit?
First, a terminology quibble: there is no such thing as unsigned integer underflow, precisely because of the way they wrap around (using modulo arithmetic), which is probably the phrase you meant.
Second, is this a common scenario to be in? Yes, it is a bit. You're not the only one doing "unsavoury things" with loops for reverse counting, and I bet there are a ton of bugs out there where people haven't done "unsavoury things" and, as a result, their code has an unsavoury infinite loop hidden in it. Mind you, I'm not sure I'd go so far as to call unsigneds "unsafe" as a result; like anything, they are the right tool for a subset of infinite possible jobs, and within that subset they perfectly safe.
There is debate over whether unsigned integers should be used for array indexes at all. Some standard committee members believe that their use in the standard library was a mistake; I know that several members of the c++ community here on Stack Overflow also hate unsigned values and wish they'd go away.
Personally I think having access to the full range of the integer by default is absolutely crucial (and losing that is not worth it for a single "-1" sentinel value or whatever), so I think that — while you're not alone in this requirement, and it's a sensible requirement — using unsigned array indexes by default is a good thing. (And what the heck is a negative array index? Semantics, people!)
But that doesn't help you in this scenario. So, what can you do about it? No, there's no trapping unsigned integer implementation (at least, not one that I'm aware of, let alone widespread) because that would literally violate the rules of the type as defined by C++: it would introduce well-defined underflow/overflow semantics to a type for which underflow/overflow shouldn't even be possible.
You will have to use signed integers and check for "logical underflow" (i.e. going out of your desired range, say -1) yourself. You could wrap this behaviour in a class.
I suppose you could actually just wrap an unsigned integer while you're at it, adding some extra logic to operator-- and operator-= to detect a wrap-around and throw.
But I guess my point is that, whatever you do, it's going to be in your "code space" and thus subject to decreased performance. You can't eke out this behaviour from the platform itself.

Once again: strict aliasing rule and char*

The more I read, the more confused I get.
The last question from the related ones is closest to my question, but I got confused with all words about object lifetime and especially - is it OK to only read or not.
To get straight to the point. Correct me if I'm wrong.
This is fine, gcc does not give warning and I'm trying to "read type T (uint32_t) via char*":
uint32_t num = 0x01020304;
char* buff = reinterpret_cast< char* >( &num );
But this is "bad" (also gives a warning) and I'm trying "the other way around":
char buff[ 4 ] = { 0x1, 0x2, 0x3, 0x4 };
uint32_t num = *reinterpret_cast< uint32_t* >( buff );
How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)? Plus, adding const does not change the situation in any way.
Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"?
I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).
Is this the same for C and C++ (as I read a comment, implying it's different for the 2 languages)?
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
So, after reading a lot, I'm now more confused. I guess only memcpy is the "good" solution?
Related questions:
What is the strict aliasing rule?
"dereferencing type-punned pointer will break strict-aliasing rules" warning
Do I understand C/C++ strict-aliasing correctly?
Strict aliasing rule and 'char *' pointers
EDIT
The real world situation: I have a third party lib (http://www.fastcrypto.org/), which calculates UMAC and the returned value is in char[ 4 ]. Then I need to convert this to uint32_t. And, btw, the lib uses things like ((UINT32 *)pc->nonce)[0] = ((UINT32 *)nonce)[0] a lot. Anyway.
Also, I'm asking about what is right and what is wrong and why. Not only about the reordering, optimization, etc. (what's interesting is that with -O0 there are no warnings, only with -O2).
And please note: I'm aware of the big/little endian situation. It's not the case here. I really want to ignore the endianness here. The "strict aliasing rules" sounds like something really serious, far more serious than wrong endianness. I mean - like accessing/modifying memory, which is not supposed to be touched; any kind of UB at all.
Quotes from the standards (C and C++) would be really appreciated. I couldn't find anything about aliasing rules or anything relevant.
How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)?
The problem is in the compiler using the rules to determine whether such an optimization is allowed. In the second case you're trying to read a char[] object via an incompatible pointer type, which is undefined behavior; hence, the compiler might re-order the read and write (or do anything else which you might not expect).
But, there are exceptions for "going the other way", i.e. reading an object of some type via a character type.
Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"? I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf chapter 3.10 paragraph 10.
In C99, and also C11, it's 6.5 paragraph 7. For C++11, it's 3.10 ("Lvalues and Rvalues").
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of character type for C or of either unsigned char or char type for C++). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
Although the wording of the standard is horribly ambiguous, in C99 (and beyond) it's clear (at least since C99 TC3) that the intent is to allow type-punning through a union. You must however perform all accesses through the union. It's also not clear that you can "cast a union into existence", that is, the union object must exist first before you use it for type-punning.
the returned value is in char[ 4 ]. Then I need to convert this to uint32_t
Just use memcpy or manually shift the bytes to the correct position, in case byte-ordering is an issue. Good compilers can optimize this out anyway (yes, even the call to memcpy).
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
Endianess is the reason for this. Specifically the sequence of bytes 01 00 00 00 could mean 1 or 16,777,216.
The correct way to do what you are doing is to stop trying to trick the compiler into doing the conversion for you and perform the conversion yourself.
For instance if the char[4] is little-endian (smallest byte first) then you would do something like the following.
char[] buff = new char[4];
uint32_t result = 0;
for (int i = 0; i < 4; i++)
result = (result << 8) + buff[i];
This manually performs the conversion between the two and is guaranteed to always be correct as you are doing the mathematical conversion.
Now if you were doing this conversion rapidly it might make sense to use #if and knowledge of your architecture to use a enum to do this automatically as you mentioned, but that is again getting away from portable solutions. (Also you can use something like this as your fallback if you can't be certain)

`short int` vs `int`

Should I bother using short int instead of int? Is there any useful difference? Any pitfalls?
short vs int
Don't bother with short unless there is a really good reason such as saving memory on a gazillion values, or conforming to a particular memory layout required by other code.
Using lots of different integer types just introduces complexity and possible wrap-around bugs.
On modern computers it might also introduce needless inefficiency.
const
Sprinkle const liberally wherever you can.
const constrains what might change, making it easier to understand the code: you know that this beastie is not gonna move, so, can be ignored, and thinking directed at more useful/relevant things.
Top-level const for formal arguments is however by convention omitted, possibly because the gain is not enough to outweight the added verbosity.
Also, in a pure declaration of a function top-level const for an argument is simply ignored by the compiler. But on the other hand, some other tools may not be smart enough to ignore them, when comparing pure declarations to definitions, and one person cited that in an earlier debate on the issue in the comp.lang.c++ Usenet group. So it depends to some extent on the toolchain, but happily I've never used tools that place any significance on those consts.
Cheers & hth.,
Absolutely not in function arguments. Few calling conventions are going to make any distinction between short and int. If you're making giant arrays you could use short if your data fits in short to save memory and increase cache effectiveness.
What Ben said. You will actually create less efficient code since all the registers need to strip out the upper bits whenever any comparisons are done. Unless you need to save memory because you have tons of them, use the native integer size. That's what int is for.
EDIT: Didn't even see your sub-question about const. Using const on intrinsic types (int, float) is useless, but any pointers/references should absolutely be const whenever applicable. Same for class methods as well.
The question is technically malformed "Should I use short int?". The only good answer will be "I don't know, what are you trying to accomplish?".
But let's consider some scenarios:
You know the definite range of values that your variable can take.
The ranges for signed integers are:
signed char — -2⁷ – 2⁷-1
short — -2¹⁵ – 2¹⁵-1
int — -2¹⁵ – 2¹⁵-1
long — -2³¹ – 2³¹-1
long long — -2⁶³ – 2⁶³-1
We should note here that these are guaranteed ranges, they can be larger in your particular implementation, and often are. You are also guaranteed that the previous range cannot be larger than the next, but they can be equal.
You will quickly note that short and int actually have the same guaranteed range. This gives you very little incentive to use it. The only reason to use short given this situation becomes giving other coders a hint that the values will be not too large, but this can be done via a comment.
It does, however, make sense to use signed char, if you know that you can fit every potential value in the range -128 — 127.
You don't know the exact range of potential values.
In this case you are in a rather bad position to attempt to minimise memory useage, and should probably use at least int. Although it has the same minimum range as short, on many platforms it may be larger, and this will help you out.
But the bigger problem is that you are trying to write a piece of software that operates on values, the range of which you do not know. Perhaps something wrong has happened before you have started coding (when requirements were being written up).
You have an idea about the range, but realise that it can change in the future.
Ask yourself how close to the boundary are you. If we are talking about something that goes from -1000 to +1000 and can potentially change to -1500 – 1500, then by all means use short. The specific architecture may pad your value, which will mean you won't save any space, but you won't lose anything. However, if we are dealing with some quantity that is currently -14000 – 14000, and can grow unpredictably (perhaps it's some financial value), then don't just switch to int, go to long right away. You will lose some memory, but will save yourself a lot of headache catching these roll-over bugs.
short vs int - If your data will fit in a short, use a short. Save memory. Make it easier for the reader to know how much data your variable may fit.
use of const - Great programming practice. If your data should be a const then make it const. It is very helpful when someone reads your code.

What is the uintptr_t data type?

What is uintptr_t and what can it be used for?
First thing, at the time the question was asked, uintptr_t was not in C++. It's in C99, in <stdint.h>, as an optional type. Many C++03 compilers do provide that file. It's also in C++11, in <cstdint>, where again it is optional, and which refers to C99 for the definition.
In C99, it is defined as "an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Take this to mean what it says. It doesn't say anything about size.
uintptr_t might be the same size as a void*. It might be larger. It could conceivably be smaller, although such a C++ implementation approaches perverse. For example on some hypothetical platform where void* is 32 bits, but only 24 bits of virtual address space are used, you could have a 24-bit uintptr_t which satisfies the requirement. I don't know why an implementation would do that, but the standard permits it.
uintptr_t is an unsigned integer type that is capable of storing a data pointer (whether it can hold a function pointer is unspecified). Which typically means that it's the same size as a pointer.
It is optionally defined in C++11 and later standards.
A common reason to want an integer type that can hold an architecture's pointer type is to perform integer-specific operations on a pointer, or to obscure the type of a pointer by providing it as an integer "handle".
It's an unsigned integer type exactly the size of a pointer. Whenever you need to do something unusual with a pointer - like for example invert all bits (don't ask why) you cast it to uintptr_t and manipulate it as a usual integer number, then cast back.
There are already many good answers to "what is uintptr_t data type?". I will try to address the "what it can be used for?" part in this post.
Primarily for bitwise operations on pointers. Remember that in C++ one cannot perform bitwise operations on pointers. For reasons see Why can't you do bitwise operations on pointer in C, and is there a way around this?
Thus in order to do bitwise operations on pointers one would need to cast pointers to type uintptr_t and then perform bitwise operations.
Here is an example of a function that I just wrote to do bitwise exclusive or of 2 pointers to store in a XOR linked list so that we can traverse in both directions like a doubly linked list but without the penalty of storing 2 pointers in each node.
template <typename T>
T* xor_ptrs(T* t1, T* t2)
{
return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(t1)^reinterpret_cast<uintptr_t>(t2));
}
Running the risk of getting another Necromancer badge, I would like to add one very good use for uintptr_t (or even intptr_t) and that is writing testable embedded code.
I write mostly embedded code targeted at various arm and currently tensilica processors. These have various native bus width and the tensilica is actually a Harvard architecture with separate code and data buses that can be different widths.
I use a test driven development style for much of my code which means I do unit tests for all the code units I write. Unit testing on actual target hardware is a hassle so I typically write everything on an Intel based PC either in Windows or Linux using Ceedling and GCC.
That being said, a lot of embedded code involves bit twiddling and address manipulations. Most of my Intel machines are 64 bit. So if you are going to test address manipulation code you need a generalized object to do math on. Thus the uintptr_t give you a machine independent way of debugging your code before you try deploying to target hardware.
Another issue is for the some machines or even memory models on some compilers, function pointers and data pointers are different widths. On those machines the compiler may not even allow casting between the two classes, but uintptr_t should be able to hold either.
-- Edit --
Was pointed out by #chux, this is not part of the standard and functions are not objects in C. However it usually works and since many people don't even know about these types I usually leave a comment explaining the trickery. Other searches in SO on uintptr_t will provide further explanation. Also we do things in unit testing that we would never do in production because breaking things is good.