Once again: strict aliasing rule and char* - c++

The more I read, the more confused I get.
The last question from the related ones is closest to my question, but I got confused with all words about object lifetime and especially - is it OK to only read or not.
To get straight to the point. Correct me if I'm wrong.
This is fine, gcc does not give warning and I'm trying to "read type T (uint32_t) via char*":
uint32_t num = 0x01020304;
char* buff = reinterpret_cast< char* >( &num );
But this is "bad" (also gives a warning) and I'm trying "the other way around":
char buff[ 4 ] = { 0x1, 0x2, 0x3, 0x4 };
uint32_t num = *reinterpret_cast< uint32_t* >( buff );
How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)? Plus, adding const does not change the situation in any way.
Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"?
I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).
Is this the same for C and C++ (as I read a comment, implying it's different for the 2 languages)?
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
So, after reading a lot, I'm now more confused. I guess only memcpy is the "good" solution?
Related questions:
What is the strict aliasing rule?
"dereferencing type-punned pointer will break strict-aliasing rules" warning
Do I understand C/C++ strict-aliasing correctly?
Strict aliasing rule and 'char *' pointers
The real world situation: I have a third party lib (http://www.fastcrypto.org/), which calculates UMAC and the returned value is in char[ 4 ]. Then I need to convert this to uint32_t. And, btw, the lib uses things like ((UINT32 *)pc->nonce)[0] = ((UINT32 *)nonce)[0] a lot. Anyway.
Also, I'm asking about what is right and what is wrong and why. Not only about the reordering, optimization, etc. (what's interesting is that with -O0 there are no warnings, only with -O2).
And please note: I'm aware of the big/little endian situation. It's not the case here. I really want to ignore the endianness here. The "strict aliasing rules" sounds like something really serious, far more serious than wrong endianness. I mean - like accessing/modifying memory, which is not supposed to be touched; any kind of UB at all.
Quotes from the standards (C and C++) would be really appreciated. I couldn't find anything about aliasing rules or anything relevant.

How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)?
The problem is in the compiler using the rules to determine whether such an optimization is allowed. In the second case you're trying to read a char[] object via an incompatible pointer type, which is undefined behavior; hence, the compiler might re-order the read and write (or do anything else which you might not expect).
But, there are exceptions for "going the other way", i.e. reading an object of some type via a character type.
Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"? I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf chapter 3.10 paragraph 10.
In C99, and also C11, it's 6.5 paragraph 7. For C++11, it's 3.10 ("Lvalues and Rvalues").
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of character type for C or of either unsigned char or char type for C++). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
Although the wording of the standard is horribly ambiguous, in C99 (and beyond) it's clear (at least since C99 TC3) that the intent is to allow type-punning through a union. You must however perform all accesses through the union. It's also not clear that you can "cast a union into existence", that is, the union object must exist first before you use it for type-punning.
the returned value is in char[ 4 ]. Then I need to convert this to uint32_t
Just use memcpy or manually shift the bytes to the correct position, in case byte-ordering is an issue. Good compilers can optimize this out anyway (yes, even the call to memcpy).

I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
Endianess is the reason for this. Specifically the sequence of bytes 01 00 00 00 could mean 1 or 16,777,216.
The correct way to do what you are doing is to stop trying to trick the compiler into doing the conversion for you and perform the conversion yourself.
For instance if the char[4] is little-endian (smallest byte first) then you would do something like the following.
char[] buff = new char[4];
uint32_t result = 0;
for (int i = 0; i < 4; i++)
result = (result << 8) + buff[i];
This manually performs the conversion between the two and is guaranteed to always be correct as you are doing the mathematical conversion.
Now if you were doing this conversion rapidly it might make sense to use #if and knowledge of your architecture to use a enum to do this automatically as you mentioned, but that is again getting away from portable solutions. (Also you can use something like this as your fallback if you can't be certain)


Use cases of std::byte [duplicate]

This question already has an answer here:
What is the purpose of std::byte?
(1 answer)
Closed 5 years ago.
The recent addition of std::byte to C++17 got me wondering why this type was even added to the standard at all. Even after reading the cppreference reference it's use cases don't seem clear to me.
The only use case I can come up with is that it more clearly expresses intent, as std::byte should only be treated as a collection of bits instead of a character type such as char which we used for both purposes before.
Meaning that:
std::vector<std::byte> memory;
Is more clear than this:
std::vector<char> memory;
Is this the only use case and reason it was added to the standard or am I missing a big point here?
The only use case I can come up with is that it more clearly expresses intent
I think it was one of the reasons. This paper explains the motivation behind std::byte and compares its usage with the usage of char:
Motivation and Scope
Many programs require byte-oriented access to
memory. Today, such programs must use either the char, signed char, or
unsigned char types for this purpose. However, these types perform a
“triple duty”. Not only are they used for byte addressing, but also as
arithmetic types, and as character types. This multiplicity of roles
opens the door for programmer error – such as accidentally performing
arithmetic on memory that should be treated as a byte value – and
confusion for both programmers and tools. Having a distinct byte type
improves type-safety, by distinguishing byte-oriented access to memory
from accessing memory as a character or integral value. It improves
Having the type would also make the intent of code
clearer to readers (as well as tooling for understanding and
transforming programs). It increases type-safety by removing
ambiguities in expression of programmer’s intent, thereby increasing
the accuracy of analysis tools.
Another reason is that std::byte is restricted in terms of operations which can be performed on this type:
Like char and unsigned char, it can be used to access raw memory
occupied by other objects (object representation), but unlike those
types, it is not a character type and is not an arithmetic type. A
byte is only a collection of bits, and only bitwise logic operators
are defined for it.
which ensures an additional type safety as it is mentioned in the paper above.

Undefined behaviour in RE2 which stated to be well defined

Recently I've found that the RE2 library uses this technique for fast set lookups. During the lookup it uses values from uninitialized array, which, as far as I know, is undefined behaviour.
I've even found this issue with valgrind warnings about use of uninitialized memory. But the issue was closed with a comment that this behaviour is indended.
I suppose that in reality an uninitialized array will just contain some random data on all modern compilers and architectures. But on the other hand I treat the 'undefined behaviour' statement as 'literally anything can happen' (including your program formats your hard drive or Godzilla comes and destroys your city).
The question is: is it legal to use uninitialized data in C++?
When C was originally designed, if arr was an array of some type T occupying N bytes, an expression like arr[i] meant "take the base address of arr, add i*N to it, fetch N bytes at the resulting address, and interpret them as a T". If every possible combination of N bytes would have a meaning when interpreted as a type T, fetching an uninitialized array element may yield an arbitrary value, but the behavior would otherwise be predictable. If T is a 32-bit type, an attempt to read an uninitialized array element of type T would yield one of at most 4294967296 possible behaviors; such action would be safe if and only if every one of those 4294967296 behaviors would meet a program's requirements. As you note, there are situations where such a guarantee is useful.
The C Standard, however, describes a semantically-weaker language which does not guarantee that an attempt to read an uninitialized array element will behave in a fashion consistent with any bit pattern the storage might have contain. Compiler writers want to process this weaker language, rather than the one Dennis Ritchie invented, because it allows them to apply a number of optimizations without regard for how they interact. For example, if code performs a=x; and later performs b=a; and c=a;, and if a compiler can't "see" anything between the original assignment and the later ones that could change a or x, it could omit the first assignment and change the latter two assignments to b=x; and c=x;. If, however, something happens between the latter two assignments that would change x, that could result in b and c getting different values--something that should be impossible if nothing changes a.
Applying that optimization by itself wouldn't be a problem if nothing changed x that shouldn't. On the other hand, consider code which uses some allocated storage as type float, frees it, re-allocates it, and uses it as type int. If the compiler knows that the original and replacement request are of the same size, it could recycle the storage without freeing and reallocating it. That could, however, cause the code sequence:
float *fp = malloc(4);
*fp = slowCalculation();
somethingElse = *fp;
int *ip = malloc(4);
to get rewritten as:
float *fp = malloc(4);
startSlowCalculation(); // Use some pipelined computation unit
int *ip = (int*)fp;
*fp = resultOfSlowCalculation(); // ** Moved from up above
somethingElse = *fp;
It would be rare for performance to benefit particularly from processing the result of the slow calculation between the writes to b and c. Unfortunately, compilers aren't designed in a way that would make it convenient to guarantee that a deferred calculation wouldn't by chance land in exactly the spot where it would cause trouble.
Personally, I regard compiler writers' philosophy as severely misguided: if a programmer in a certain situation knows that a guarantee would be useful, requiring the programmer to work around the lack of it will impose significant cost with 100% certainty. By contrast, a requirement that compiler refrain from optimizations that are predicated on the lack of that guarantee would rarely cost anything (since code to work around its absence would almost certainly block the "optimization" anyway). Unfortunately, some people seem more interested in optimizing the performance of those source texts which don't need guarantees beyond what the Standard mandates, than in optimizing the efficiency with which a compiler can generate code to accomplish useful tasks.

Trouble reading line of code with reference & dereference operators

I'm having trouble reading through a series of * and & operators in order to understand two lies of code within a method. The lines are:
int dummy = 1;
if (*(char*)&dummy) { //Do stuff
As best I can determine:
dummy is allocated on the stack and its value is set to 1
&dummy returns the memory location being used by dummy (i.e. where the 1 is)
(char*)&dummy casts &dummy into a pointer to a char, instead of a pointer to an int
*(char*)&dummy dereferences (char*)&dummy, returning whatever char has a numeric value of 1
This seems like an awfully confusing way to say:
if (1){//Do stuuf }
Am I understanding these lines correctly? If so, why would someone do this (other than to confuse me)?
The code is certainly not portable but is apparently intended to detect the endianess of the system: where the non-zero bit for int(1) is located depends on whether the system is big or little endian. In one case the result of the expression is assumed to be 0, in the other case it is assumed to be non-zero. I think it is undefined behavior anyway, though. Also, in theory there is also DS9k endianess which entirely garbles the bytes up (although I don't think there is any system which actually does it).

Is pointer conversion expensive or not?

Is pointer conversion considered expensive? (e.g. how many CPU cycles it takes to convert a pointer/address), especially when you have to do it quite frequently, for instance (just an example to show the scale of freqency, I know there are better ways for this particular cases):
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
A[i]=foo((unsigned char*)x+i);
(e.g. how many CPU cycles it takes to convert a pointer/address)
In most machine code languages there is only 1 "type" of pointer and so it doesn't cost anything to convert between them. Keep in mind that C++ types really only exist at compile time.
The real issue is that this sort of code can break strict aliasing rules. You can read more about this elsewhere, but essentially the compiler will either produce incorrect code through undefined behavior, or be forced to make conservative assumptions and thus produce slower code. (note that the char* and friends is somewhat exempt from the undefined behavior part)
Optimizers often have to make conservative assumptions about variables in the presence of pointers. For example, a constant propagation process that knows the value of variable x is 5 would not be able to keep using this information after an assignment to another variable (for example, *y = 10) because it could be that *y is an alias of x. This could be the case after an assignment like y = &x.
As an effect of the assignment to *y, the value of x would be changed as well, so propagating the information that x is 5 to the statements following *y = 10 would be potentially wrong (if *y is indeed an alias of x). However, if we have information about pointers, the constant propagation process could make a query like: can x be an alias of *y? Then, if the answer is no, x = 5 can be propagated safely.
Another optimization impacted by aliasing is code reordering. If the compiler decides that x is not aliased by *y, then code that uses or changes the value of x can be moved before the assignment *y = 10, if this would improve scheduling or enable more loop optimizations to be carried out.
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition, see section 6.5, paragraph 7) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", sometime allows for impressive increases in performance,[1] but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[2] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[3] In such cases, when compiled with gcc, the option -fno-strict-aliasing is invoked to prevent unwanted optimizations that could yield unexpected code.
What is the strict aliasing rule?
On any architecture you're likely to encounter, all pointer types have the same representation, and so conversion between different pointer types representing the same address has no run-time cost. This applies to all pointer conversions in C.
In C++, some pointer conversions have a cost and some don't:
reinterpret_cast and const_cast (or an equivalent C-style cast, such as the one in the question) and conversion to or from void* will simply reinterpret the pointer value, with no cost.
Conversion between pointer-to-base-class and pointer-to-derived class (either implicitly, or with static_cast or an equivalent C-style cast) may require adding a fixed offset to the pointer value if there are multiple base classes.
dynamic_cast will do a non-trivial amount of work to look up the pointer value based on the dynamic type of the object pointed to.
Historically, some architectures (e.g. PDP-10) had different representations for pointer-to-byte and pointer-to-word; there may be some runtime cost for conversions there.
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
A[i]=foo((unsigned char*)x+i); // bad cast
Remember, the machine only knows memory addresses, data and code. Everything else (such as types etc) are known only to the Compiler(that aid the programmer), and that does all the pointer arithmetic, only the compiler knows the size of each type.. so on and so forth.
At runtime, there are no machine cycles wasted in converting one pointer type to another because the conversion does not happen at runtime. All pointers are treated as of 4 bytes long(on a 32 bit machine) nothing more and nothing less.
It all depends on your underlying hardware.
On most machine architectures, all pointers are byte pointers, and converting between a byte pointer and a byte pointer is a no-op. On some architectures, a pointer conversion may under some circumstances require extra manipulation (there are machines that work with word based addresses for instance, and converting a word pointer to a byte pointer or vice versa will require extra manipulation).
Moreover, this is in general an unsafe technique, as the compiler can't perform any sanity checking on what you are doing, and you can end up overwriting data you didn't expect.

What does this C++ construct do?

Somewhere in lines of code, I came across this construct...
//void* v = void* value from an iterator
int i = (int)(long(v))
What possible purpose can this contruct serve?
Why not simply use int(v) instead? Why the cast to long first?
It most possibly silences warnings.
Assuming a 32bit architecture with sizeof(int) < sizeof(long) and sizeof(long) == sizeof(void *) you possibly get a warning if you cast a void * to an int and no warning if you cast a void * to a long as you're not truncating. You then get a warning assigning a long to an int (possible truncation) which is removed by then explicitly casting from a long to an int.
Without knowing the compiler it's hard to say, but I've certainly seen multi-step casts required to prevent warnings. Why not try converting the construct to what you think it should be and see what the compiler says (of course that only helps you to work out what was in the mind of the original programmer if you're using the same compiler and same warning level as they were).
It does eeevil.
On most architectures, a pointer can be considered to be just another kind of number. On most architectures, long is as many bits as a pointer, so there is a 1-to-1 map between long values and pointers. But violations, especially of the second rule, are not uncommon!
long(v) is an alias for reinterpret_cast<long>(v), which carries no guarantees. Not really fit for any purpose, unless your ABI spec says otherwise.
However, for whatever reason, whoever wrote that code prefers int to long. So they again cross their fingers and hope that no essential information is thrown out in the bits that may possibly be lost in the int to long cast.
Two uses of this are creating a unique object identifier, or trying to somehow package the pointer for some kind of arithmetic otherwise unsupported by pointers.
An opaque identifier can be a void*, so casting to integral type is unnecessary.
"Extracting" an integer from a pointer (for e.g. a division operation) can always be done by subtracting a base pointer to obtain a difference of type ptrdiff_t, which is usually long.