Optimisation and strict aliasing

Optimisation and strict aliasing - c++

My question is regarding a code fragment, such as below:
#include <iostream>
int main() {
double a = -50;
std::cout << a << "\n";
uint8_t* b = reinterpret_cast<uint8_t*>(&a);
b[7] &= 0x7F;
std::cout << a << "\n";
return 0;
}
As far as I can tell I am not breaking any rules and everything is well defined (as noted below I forgot that uint8_t is not allowed to alias other types). There is some implementation defined behavior going on, but for the purpose of this question I don't think that is relevant.
I would expect this code to print -50, then 50 on systems where the double follows the IEEE standard, is 8 bytes long and is stored in little endian format. Now the question is. Does the compiler guarantee that this happens. More specifically, turning on optimisations can the compiler optimise away the middle b[7], either explicitly or implicitly, by simply keeping a in a register through the whole function. The second one obviously could be solved by specifying volatile double a, but is that needed?
Edit: As an a note I (mistakenly) remembered that uint8_t was required to be an alias for unsigned char, but indeed the standard does not specify such. I have also written the question in a way that, yes the compiler can ahead of time know everything here, but modified to
#include <iostream>
int main() {
double a;
std::cin >> a;
std::cout << a << "\n";
unsigned char* b = reinterpret_cast<unsigned char*>(&a);
b[7] &= 0x7F;
std::cout << a << "\n";
return 0;
}
one can see where the problem might arise. Here the strict aliasing rule is no longer violated, and a is not a compile time constant. Richard Critten's comment however is curious if the aliased data can be examined, but not written, is there a way one can set individual bytes, while still following the standard?

More specifically, turning on optimisations can the compiler optimise away the middle b[7], either explicitly or implicitly, by simply keeping a in a register through the whole function.
The compiler can generate the double value 50 as a constant, and pass that directly to the output function. b can be optimised away completely. Like most optimisation, this is due to the as-if rule:
[intro.abstract]
The semantic descriptions in this document define a parameterized nondeterministic abstract machine.
This document places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine.
Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
The second one obviously could be solved by specifying volatile double a
That would prevent the optimisation, which would generally be considered to be the opposite of a solution.
Does the compiler guarantee that [50 is printed].
You didn't mention what compiler you are asking about. I'm going to assume that you mean whether the standard guarantees this. It doesn't guarantee that universally. You are relying on several assumptions about the implementation:
If sizeof(double) < 8, then you access the object outside of its bounds, and behaviour of the program is undefined.
If std::uint8_t is not an a type alias of unsigned char, then it isn't allowed to alias double, and the behaviour of the program is undefined.
Given the assumptions hold and thus behviour is well-defined, then the second output will be of a double value that is like -50, but whose most significant bit(s from 8th forward) of the byte at position 7 will have been set to 0. In case of little endian IEEE-754 representation, that value would be 50. volatile is not needed to guarantee this, and it won't add a guarantee in case the behaviour of the program is undefined.

Related

Clang 14 and 15 apparently optimizing away code that compiles as expected under Clang 13, ICC, GCC, MSVC

I have the following sample code:
inline float successor(float f, bool const check)
{
const unsigned long int mask = 0x7f800000U;
unsigned long int i = *(unsigned long int*)&f;
if (check)
{
if ((i & mask) == mask)
return f;
}
i++;
return *(float*)&i;
}
float next1(float a)
{
return successor(a, true);
}
float next2(float a)
{
return successor(a, false);
}
Under x86-64 clang 13.0.1, the code compiles as expected.
Under x86-64 clang 14.0.0 or 15, the output is merely a ret op for next1(float) and next2(float).
Compiler options: -march=x86-64-v3 -O3
The code and output are here: Godbolt.
The successor(float,bool) function is not a no-op.
As a note, the output is as expected under GCC, ICC, and MSVCC. Am I missing something here?

*(unsigned long int*)&f is an immediate aliasing violation. f is a float. You are not allowed to access it through a pointer to unsigned long int. (And the same applies to *(float*)&i.)
So the code has undefined behavior and Clang likes to assume that code with undefined behavior is unreachable.
Compile with -fno-strict-aliasing to force Clang to not consider aliasing violations as undefined behavior that cannot happen (although that is probably not sufficient here, see below) or better do not rely on undefined behavior. Instead use either std::bit_cast (since C++20) or std::memcpy to create a copy of f with the new type but same object representation. That way your program will be valid standard C++ and not rely on the -fno-strict-aliasing compiler extension.
(And if you use std::memcpy add a static_assert to verify that unsigned long int and float have the same size. That is not true on all platforms and also not on all common platforms. std::bit_cast has the test built-in.)
As noticed by #CarstenS in the other answer, given that you are (at least on compiler explorer) compiling for the SysV ABI, unsigned long int (64bit) is indeed a different size than float (32bit). Consequently there is much more direct UB in that you are accessing memory out-of-bounds in the initialization of i. And as he also noticed Clang does seem to compile the code as intended when an integer type of matching size is used, even without -fno-strict-aliasing. This does not invalidate what I wrote above in general though.

Standards and UB aside, on your target platform float is 32 bits and long is 64 bits, so I am surprised by the clang 13 code (indeed I think you will get actual UB with -O0). If you use uint32_t instead of long, the problem goes away.

Some compiler writers interpret the Standard as deprecating "non-portable or erroneous" program constructs, including constructs which implementations for commonplace hardware had to date had unanimously processed in a manner consistent with implementation-defined behavioral traits such as numeric representations.
Compilers that are designed for paying customers will look at a construct like:
unsigned long int i = *(unsigned long int*)&f; ; f is of type float
and recognize that while converting the address of a float to an unsigned long* is non-portable construct, it was almost certainly written for the purpose of examining the bits of a float type. This is a very different situation from the one offered in the published Rationale as being the reason for the rule, which was more like:
int x;
int test(double *p)
{
x = 1;
*p = 2.0;
return x;
}
In the latter situation, it would be theoretically possible that *p points to or overlaps x, and that the programmer knows what precedes and/or follows x in memory, and the authors of the Standard recognized that having the function unconditionally returned 1 would be incorrect behavior if that were the case, but decided that there was no need to mandate support for such dubious possibilities.
Returning to the original, that represents a completely different situation since any compiler that isn't willfully blind to such things would know that the address being accessed via type unsigned long* was formed from a pointer of type float*. While the Standard wouldn't forbid compilers from being willfully blind to the possibility that a float* might actually hold the address of storage that will be accessed using type float, that's because the Standard saw no need to mandate that compiler writers do things which anyone wanting to sell compilers would do, with or without a mandate.
Probably not coincidentally, the compilers I'm aware of that would require a -fno-strict-aliasing option to usefully process constructs such as yours also require that flag in order to correctly process some constructs whose behavior is unambiguously specified by the Standard. Rather than jumping through hoops to accommodate a deficient compiler configurations, a better course of action would be to simply use the "don't make buggy aliasing optimizations" option.

Reducing memory alignment

I want to know if it is possible to "reduce" the alignment of a datatype in C++. For example, the alignment of int is 4; I want to know if it's possible to set the alignment of int to 1 or 2. I tried using the alignas keyword but it didn't seem to work.
I want to know if this is something not being done by my compiler or the C++ standard doesn't allow this; for either case, I would like to know the reason why it is as such.

I want to know if it is possible to "reduce" the alignment of a datatype in C++.
It is not possible. From this Draft C++ Standard:
10.6.2 Alignment specifier      [dcl.align]
…
5     The combined effect of all
alignment-specifiers in a declaration shall not specify an alignment
that is less strict than the alignment that would be required for the
entity being declared if all alignment-specifiers appertaining to that
entity were omitted.
The 'reason' for this is that, in most cases, alignment requirements are dictated by the hardware that is being targeted: if a given CPU requires that an int be stored in a 4-byte-aligned address then, if the compiler were allowed to generate code that puts such an int in a less strictly aligned memory location, the program would cause a hardware fault, when run. (Note that, on some platforms, the alignment requirement for an int is only 1 byte, even though access may be optimized when more strictly aligned.)
Some compilers may offer ways that appear to allow alignment reduction; for example, MSVC has the __declspec(align(#)) extension, which can be applied in a typedef statement. However, from the documentation: __declspec(align(#)) can only increase alignment restrictions:
#include <iostream>
typedef __declspec(align(1)) int MyInt; // No compiler error, but...
int main()
{
std::cout << alignof(int) << "\n"; // "4"
std::cout << alignof(MyInt) << "\n"; // "4" ...doesn't reduce the aligment requirement
return 0;
}

Conversion of int to int* using union [duplicate]

It appears from other StackOverflow questions and reading §9.5.1 of the ISO/IEC draft C++ standard standard that the use of unions to do a literal reinterpret_cast of data is undefined behavior.
Consider the code below. The goal is to take the integer value of 0xffff and literally interpret it as a series of bits in IEEE 754 floating point. (Binary convert shows visually how this is done.)
#include <iostream>
using namespace std;
union unionType {
int myInt;
float myFloat;
};
int main() {
int i = 0xffff;
unionType u;
u.myInt = i;
cout << "size of int " << sizeof(int) << endl;
cout << "size of float " << sizeof(float) << endl;
cout << "myInt " << u.myInt << endl;
cout << "myFloat " << u.myFloat << endl;
float theFloat = *reinterpret_cast<float*>(&i);
cout << "theFloat " << theFloat << endl;
return 0;
}
The output of this code, using both GCC and clang compilers is expected.
size of int 4
size of float 4
myInt 65535
myFloat 9.18341e-41
theFloat 9.18341e-41
My question is, does the standard actually preclude the value of myFloat from being deterministic? Is the use of a reinterpret_cast better in any way to perform this type of conversion?
The standard states the following in §9.5.1:
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
The last sentence, guaranteeing that all non-static members have the same address, seems to indicate the use of a union is guaranteed to be identical to the use of a reinterpret_cast, but the earlier statement about active data members seems to preclude this guarantee.
So which construct is more correct?
Edit:
Using Intel's icpc compiler, the above code produces even more interesting results:
$ icpc union.cpp
$ ./a.out
size of int 4
size of float 4
myInt 65535
myFloat 0
theFloat 0

The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.
Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.
This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.

It's not undefined behavior. It's implementation defined behavior. The first does mean that bad things can happen. The other means that what will happen has to be defined by the implementation.
The reinterpret_cast violates the strict aliasing rule. So I do not think it will work reliably. The union trick is what people call type-punning and is usually allowed by compilers. The gcc folks document the behavior of the compiler: http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation
I think this should work with icpc as well (but they do not appear to document how they implemented that). But when I looked the assembly, it looks like icc tries to cheat with float and use higher precision floating point stuff. Passing -fp-model source to the compiler fixed that. With that option, I get the same results as with gcc.
I do not think you want to use this flag in general, this is just a test to verify my theory.
So for icpc, I think if you switch your code from int/float to long/double, type-punning will work on icpc as well.

Undefined behavior does not mean bad things must happen. It means only that the language definition doesn't tell you what happens. This kind of type pun has been part of C and C++ programming since time immemorial (i.e., since 1969); it would take a particularly perverse implementor to write a compiler where this didn't work.

With strict aliasing in C++11, is it defined to _write_ to a char, then _read_ from an aliased nonchar?

There are many discussions of strict aliasing (notably "What is the strict aliasing rule?" and "Strict aliasing rule and 'char *' pointers"), but this is a corner case I don't see explicitly addressed.
Consider this code:
int x;
char *x_alias = reinterpret_cast<char *>(&x);
x = 1;
*x_alias = 2; // [alias-write]
printf("x is now %d\n", x);
Must the printed value reflect the change in [alias-write]? (Clearly there are endianness and representation considerations, that's not my concern here.)
The famous [basic.lval] clause of the C++11 spec uses this language (emphasis mine):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
... various other conditions ...
a char or unsigned char type.
I can't figure out whether "access" refers only to read operations (read chars from a nonchar object) or also to write operations (write chars onto a nonchar object). If there's a formal definition of "access" in the spec, I can't find it, but in other places the spec seems to use "access" for reads and "update" for writes.
This is of particular interest when deserializing; it's convenient and efficient to bring data directly from a wire into an object, without requiring an intermediate memcpy() from a char-buffer into the object.

is it defined to _write_ to a char*, then _read_ from an aliased nonchar*?
Yes.
Must the printed value reflect the change in [alias-write]?
Yes.
Strict aliasing says ((un)signed) char* can alias anything. The word "access" means both read and write operations.

The authors of the C89 Standard wanted to allow e.g.
int thing;
unsigned char *p = &x;
int i;
for (i=0; i<sizeof thing; i++)
p[i] = getbyte();
and
int thing = somevalue();
unsigned char *p = &x;
int i;
for (i=0; i<sizeof thing; i++)
putbyte(p[i]);
but not to require that compilers handle any possible aliasing given something
like:
/* global definitions */
int thing;
double *p;
int x(double *p)
{
thing = 1;
*p = 1.0;
return thing;
}
There are two ways in which the supported and non-supported cases differ: (1) in the cases to be supported, the access is made using a character-type pointer rather than some other type, and (2) after the address of the thing in question is converted to another type, all accesses to the storage using that pointer are made before the next access using the original lvalue. The authors of the Standard unfortunately regarded only first as significant, even though the second would have been a much more reliable way of identifying cases where aliasing may be important. If the Standard had focused on the second, it might not have required compilers to recognize aliasing in your example. As it is, though, the Standard requires that compilers recognize aliasing any time programs use character types, despite the needless impact on the performance of code that is processing actual character data.
Rather than fixing this fundamental mistake, other standards for both C and C++ have simply kept on with the same broken approach.

C++ unions vs. reinterpret_cast

It appears from other StackOverflow questions and reading §9.5.1 of the ISO/IEC draft C++ standard standard that the use of unions to do a literal reinterpret_cast of data is undefined behavior.
Consider the code below. The goal is to take the integer value of 0xffff and literally interpret it as a series of bits in IEEE 754 floating point. (Binary convert shows visually how this is done.)
#include <iostream>
using namespace std;
union unionType {
int myInt;
float myFloat;
};
int main() {
int i = 0xffff;
unionType u;
u.myInt = i;
cout << "size of int " << sizeof(int) << endl;
cout << "size of float " << sizeof(float) << endl;
cout << "myInt " << u.myInt << endl;
cout << "myFloat " << u.myFloat << endl;
float theFloat = *reinterpret_cast<float*>(&i);
cout << "theFloat " << theFloat << endl;
return 0;
}
The output of this code, using both GCC and clang compilers is expected.
size of int 4
size of float 4
myInt 65535
myFloat 9.18341e-41
theFloat 9.18341e-41
My question is, does the standard actually preclude the value of myFloat from being deterministic? Is the use of a reinterpret_cast better in any way to perform this type of conversion?
The standard states the following in §9.5.1:
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
The last sentence, guaranteeing that all non-static members have the same address, seems to indicate the use of a union is guaranteed to be identical to the use of a reinterpret_cast, but the earlier statement about active data members seems to preclude this guarantee.
So which construct is more correct?
Edit:
Using Intel's icpc compiler, the above code produces even more interesting results:
$ icpc union.cpp
$ ./a.out
size of int 4
size of float 4
myInt 65535
myFloat 0
theFloat 0

The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.
Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.
This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.

It's not undefined behavior. It's implementation defined behavior. The first does mean that bad things can happen. The other means that what will happen has to be defined by the implementation.
The reinterpret_cast violates the strict aliasing rule. So I do not think it will work reliably. The union trick is what people call type-punning and is usually allowed by compilers. The gcc folks document the behavior of the compiler: http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation
I think this should work with icpc as well (but they do not appear to document how they implemented that). But when I looked the assembly, it looks like icc tries to cheat with float and use higher precision floating point stuff. Passing -fp-model source to the compiler fixed that. With that option, I get the same results as with gcc.
I do not think you want to use this flag in general, this is just a test to verify my theory.
So for icpc, I think if you switch your code from int/float to long/double, type-punning will work on icpc as well.

Undefined behavior does not mean bad things must happen. It means only that the language definition doesn't tell you what happens. This kind of type pun has been part of C and C++ programming since time immemorial (i.e., since 1969); it would take a particularly perverse implementor to write a compiler where this didn't work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js