constexpr: gcc is trying harder to eval constexpr than clang - c++

I'm using godbolt to see generated code with gcc and clang.
I tried to implement to djb2 hash.
gcc always trying is best to eval constexpr function.
clang is evaluating constexpr only if the variable is constexpr.
Let's see the example:
constexpr int djb2(char const *str)
{
int hash = 5381;
int c = 0;
while ((c = *str++))
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
int main()
{
int i = djb2("hello you :)");
}
With this example, gcc is evaluating a compile time i. But clang at run time.
If I add constexpr to i, clang is evaluating also at compile time.
Do you know if the standard is saying something about that ?
EDIT: thanks to all. So, as I understand, without constexpr the compiler is doing what is want. With constexpr, the compiler is forced to evaluating the constant.

Your program has undefined behavior.
The shift hash << 5 will overflow which has undefined behavior for signed integer types before C++20.
In particular that means that calling your function can never yield a constant expression, which you can verify by adding constexpr to your declaration of i. Both compilers will then have to diagnose the undefined behavior and will tell you about it.
Give hash an unsigned type and your code will actually have well-defined behavior and the expression djb2("hello you :)" will actually be a constant expression that can be evaluated at compile-time, assuming you are using C++14 or later (The loop was not allowed in a constexpr function in C++11.).
This still doesn't require the compiler to actually do the evaluation at compile-time, but then you can force it by adding constexpr to the declaration of i.
"Force" here is relative. Because of the as-if rule and because there is no observable difference between evaluation at compile-time and runtime, the compiler is still not technically required to really do the computation only at compile-time, but it requires the compiler to check the whole calculation for validity, which is basically the same as evaluating it, so it would be unreasonable for the compiler to repeat the evaluation at runtime.
Similarly "can be evaluated at compile-time" is relative as well. Again for the same reasons as above, a compiler can still choose to do the calculations at compile-time even if it is not a constant expression as long as there wouldn't be any observable difference in behavior. This is purely a matter of optimizer quality. In your specific case the program has undefined behavior, so the compilers can choose to do what they want anyway.

Related

Clang 14 and 15 apparently optimizing away code that compiles as expected under Clang 13, ICC, GCC, MSVC

I have the following sample code:
inline float successor(float f, bool const check)
{
const unsigned long int mask = 0x7f800000U;
unsigned long int i = *(unsigned long int*)&f;
if (check)
{
if ((i & mask) == mask)
return f;
}
i++;
return *(float*)&i;
}
float next1(float a)
{
return successor(a, true);
}
float next2(float a)
{
return successor(a, false);
}
Under x86-64 clang 13.0.1, the code compiles as expected.
Under x86-64 clang 14.0.0 or 15, the output is merely a ret op for next1(float) and next2(float).
Compiler options: -march=x86-64-v3 -O3
The code and output are here: Godbolt.
The successor(float,bool) function is not a no-op.
As a note, the output is as expected under GCC, ICC, and MSVCC. Am I missing something here?
*(unsigned long int*)&f is an immediate aliasing violation. f is a float. You are not allowed to access it through a pointer to unsigned long int. (And the same applies to *(float*)&i.)
So the code has undefined behavior and Clang likes to assume that code with undefined behavior is unreachable.
Compile with -fno-strict-aliasing to force Clang to not consider aliasing violations as undefined behavior that cannot happen (although that is probably not sufficient here, see below) or better do not rely on undefined behavior. Instead use either std::bit_cast (since C++20) or std::memcpy to create a copy of f with the new type but same object representation. That way your program will be valid standard C++ and not rely on the -fno-strict-aliasing compiler extension.
(And if you use std::memcpy add a static_assert to verify that unsigned long int and float have the same size. That is not true on all platforms and also not on all common platforms. std::bit_cast has the test built-in.)
As noticed by #CarstenS in the other answer, given that you are (at least on compiler explorer) compiling for the SysV ABI, unsigned long int (64bit) is indeed a different size than float (32bit). Consequently there is much more direct UB in that you are accessing memory out-of-bounds in the initialization of i. And as he also noticed Clang does seem to compile the code as intended when an integer type of matching size is used, even without -fno-strict-aliasing. This does not invalidate what I wrote above in general though.
Standards and UB aside, on your target platform float is 32 bits and long is 64 bits, so I am surprised by the clang 13 code (indeed I think you will get actual UB with -O0). If you use uint32_t instead of long, the problem goes away.
Some compiler writers interpret the Standard as deprecating "non-portable or erroneous" program constructs, including constructs which implementations for commonplace hardware had to date had unanimously processed in a manner consistent with implementation-defined behavioral traits such as numeric representations.
Compilers that are designed for paying customers will look at a construct like:
unsigned long int i = *(unsigned long int*)&f; ; f is of type float
and recognize that while converting the address of a float to an unsigned long* is non-portable construct, it was almost certainly written for the purpose of examining the bits of a float type. This is a very different situation from the one offered in the published Rationale as being the reason for the rule, which was more like:
int x;
int test(double *p)
{
x = 1;
*p = 2.0;
return x;
}
In the latter situation, it would be theoretically possible that *p points to or overlaps x, and that the programmer knows what precedes and/or follows x in memory, and the authors of the Standard recognized that having the function unconditionally returned 1 would be incorrect behavior if that were the case, but decided that there was no need to mandate support for such dubious possibilities.
Returning to the original, that represents a completely different situation since any compiler that isn't willfully blind to such things would know that the address being accessed via type unsigned long* was formed from a pointer of type float*. While the Standard wouldn't forbid compilers from being willfully blind to the possibility that a float* might actually hold the address of storage that will be accessed using type float, that's because the Standard saw no need to mandate that compiler writers do things which anyone wanting to sell compilers would do, with or without a mandate.
Probably not coincidentally, the compilers I'm aware of that would require a -fno-strict-aliasing option to usefully process constructs such as yours also require that flag in order to correctly process some constructs whose behavior is unambiguously specified by the Standard. Rather than jumping through hoops to accommodate a deficient compiler configurations, a better course of action would be to simply use the "don't make buggy aliasing optimizations" option.

C++: int calculations during compile time (not during running time)

Simple question:
Why does the code below work?
int main() {
const int a = 4;
const int b = 16;
const int size = b/a;
int arr[size] = {0,1,2,3};
return 0;
}
I thought the size of static arrays have to be defined during compile time, thus only with an "int" literal. In the code above, the code compiles although the size is a calculation. Is this calculation done during compile time?
If yes, maybe my understanding of compile and running is wrong: Compile just goes through the syntax and translates the code to machine code, but does not do any calculations...
Thank you!
Is this calculation done during compile time?
Yes, compilers did a lot optimizations and calculations for you, the initialization in your code is ok even without any optimization, and it's the result of pre-calculation of compiler.
Generally, calculation here includes the constexpr, const type declaration and so on, which are already in the definition of language itself(see constant expression).
compile-time const and example
just see the output of the example.
compile-time constexpr and example
The constexpr specifier declares that it is possible to evaluate the value of the function or variable at compile time.
This is how array can be initialized, an array declaration is as below:
noptr-declarator [ expr(optional) ] attr(optional)
here, expr is:
an integral constant expression (until C++14) a converted constant expression of type std::size_t (since C++14), which evaluates to a value greater than zero
which are all constant expression, which says:
an expression that can be evaluated at compile time.
So, using the so-called pre-calculation to initialize an array is ok.
Here's also a follow-up: there're also a lot of ways to save more calculations and time by let them done during compile time, and they are shown in the link above.
Just to mention something different: As for optimizations, you can see the difference between the assembly codes of version -O0 and -O3 when calculating the sum from 1 to 100, it's a jaw-breaker -- you'll see the result 5050 is in the assembly code in the -O3 version, it's also a kind of compile-time calculation, but not enabled for all kinds of situation.
I thought the size of static arrays have to be defined during compile time, thus only with an "int" literal.
The first part is true, but the second part was only true before c++11 (you could also have done const int i = 1 + 2). From c++11, the rules say that the initializing expression is evaluated to see if it yields a constant expression.
There is also the rule that a const integral type is implicitly constexpr (i.e. computed at compile time), so long as the initializing expression is computable at compile time.
So in this expression:
const int size = b/a;
the variable size is required to be computed at compile time, so long as the expression a/b is a constant expression. Since a and b are also const ints that are initialized with a literal, the initialization of size is a constant expression. (Note that removing const from any of the declarations will make size a non-constant expression).
So size is required to be computed at compile time (the compiler has no choice in the matter, regardless of optimizations), and so it can be used as an array dimension.
Here are the complete rules for how this works.

Comparing two constexpr pointers is not constexpr?

I am looking for a way to map types to numeric values at compile time, ideally without using a hash as proposed in this answer.
Since pointers can be constexpr, I tried this:
struct Base{};
template<typename T> struct instance : public Base{};
template<typename T>
constexpr auto type_instance = instance<T>{};
template<typename T>
constexpr const Base* type_pointer = &type_instance<T>;
constexpr auto x = type_pointer<int> - type_pointer<float>; // not a constant expression
Both gcc and clang reject this code because type_pointer<int> - type_pointer<float> is not a constant expression, see here, for instance.
Why though?
I can understand that the difference between both values is not going to be stable from one compilation to the next, but within one compilation, it should be constexpr, IMHO.
Subtraction of two non-null pointers which do not point into the same array or to the same object (including one-past-the-array/object) is undefined behavior, see [expr.add] (in particular paragraph 5 and 7) of the C++17 standard (final draft).
Expressions which would have core undefined behavior[1] if evaluated are never constant expressions, see [expr.const]/2.6.
Therefore type_pointer<int> - type_pointer<float> cannot be a constant expression, because the two pointers are to unrelated objects.
Since type_pointer<int> - type_pointer<float> is not a constant expression, it cannot be used to initialize a constexpr variable such as
constexpr auto x = type_pointer<int> - type_pointer<float>;
Trying to use a non-constant expression as initializer to a constexpr variable makes the program ill-formed and requires the compiler to print a diagnostic message. This is what the error message you are seeing is.
Basically compilers are required to diagnose core undefined behavior when it appears in purely compile-time contexts.
You can see that there will be no error if the pointers are to the same object, e.g.:
constexpr auto x = type_pointer<int> - type_pointer<int>;
Here the subtraction is well-defined and the initializer is a constant expression. So the code will compile (and won't have undefined behavior). x will have a well-defined value of 0.
Be aware that if you make x non-constexpr the compiler won't be required to diagnose the undefined behavior and to print a diagnostic message anymore. It is therefore likely to compile.
Subtracting unrelated pointers is still undefined behavior though, not only unspecified behavior. Therefore you will loose any guarantee on what the resulting program will do. It does not only mean that you will get different values for x in each compilation/execution of the code.
[1] Core undefined behavior here refers to undefined behavior in the core language, in contrast with undefined behavior due to use of the standard library. It is unspecified whether undefined behavior as specified for the library causes an (otherwise constant) expression to not be a constant expression, see final sentence before the example in [expr.const]/2.

Endianness in constexpr

I want to create a constexpr function that returns the endianness of the system, like so:
constexpr bool IsBigEndian()
{
constexpr int32_t one = 1;
return (reinterpret_cast<const int8_t&>(one) == 0);
}
Now, since the function will get executed at compile time rather than on the actual target machine, what guarantee does the C++ spec give to make sure that the correct result is returned?
None. In fact, the program is ill-formed. From [expr.const]:
A conditional-expression e is a core constant expression unless the evaluation of e, following the rules of the
abstract machine (1.9), would evaluate one of the following expressions:
— [...]
— a reinterpret_cast.
— [...]
And, from [dcl.constexpr]:
For a constexpr function or constexpr constructor that is neither defaulted nor a template, if no argument
values exist such that an invocation of the function or constructor could be an evaluated subexpression of
a core constant expression (5.20), or, for a constructor, a constant initializer for some object (3.6.2), the
program is ill-formed; no diagnostic required.
The way to do this is just to hope that your compiler is nice enough to provide macros for the endianness of your machine. For instance, on gcc, I could use __BYTE_ORDER__:
constexpr bool IsBigEndian() {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
return false;
#else
return true;
#endif
}
As stated by Barry, your code is not legal C++. However, even if you took away the constexpr part, it would still not be legal C++. Your code violates strict aliasing rules and therefore represents undefined behavior.
Indeed, there is no way in C++ to detect the endian-ness of an object without invoking undefined behavior. Casting it to a char* doesn't work, because the standard doesn't require big or little endian order. So while you could read the data through a byte, you would not be able to legally infer anything from that value.
And type punning through a union fails because you're not allowed to type pun through a union in C++ at all. And even if you did... again, C++ does not restrict implementations to big or little endian order.
So as far as C++ as a standard is concerned, there is no way to detect this, whether at compile-time or runtime.

compiler behaviour on return value of new

#include <iostream>
using namespace std;
int main(void){
int size = -2;
int* p = new int[size];
cout<<p<<endl;
return 0;
}
Above code compiles without any problem on visual studio 2010.
But
#include <iostream>
using namespace std;
int main(void){
const int size = -2;
int* p = new int[size];
cout<<p<<endl;
return 0;
}
But this code(added const keyword) gives error on compilation(size of array cannot be negative).
Why these different results ?
By making it a constant expression, you've given the compiler the ability to diagnose the problem at compile time. Since you've made it const, the compiler can easily figure out that the value will necessarily be negative when you pass it to new.
With a non-constant expression, you have the same problem, but it takes more intelligence on the part of the compiler to detect it. Specifically, the compiler has to detect that the value doesn't change between the time you initialize it and the time you pass it to new. With optimization turned on, chances are pretty good at least some compilers still could/would detect the problem, because for optimization purposes they detect data flow so they'd "realize" that in this case the value remains constant, even though you haven't specified it as const.
Because const int size = -2; can be replaced at compilation time by the compiler - whereas a non-const cannot - the compiler can tell that size is negative and doesn't allow the allocation.
There's no way for the compiler to tell whether int* p = new int[size]; is legal or not, if size is not const - size can be altered in the program or by a different thread. A const can't.
You'll get into undefined behavior running the first sample anyway, it's just not legal.
In the first, size is a variable whose value happens to be -2 but the compiler doesn't track it as it is a variable (at least for diagnostic purpose, I'm pretty sure the optimizing phase can track it). Execution should be problematic (I don't know if it is guarantee to have an exception or just UB).
In the second, size is a constant and thus its value is known and verified at compile time.
The first yields a compilation error because the compiler can detect a valid syntax is being violated at compilation time.
The Second yields a Undefined Behaivor[1] because compiler cannot detect the illegal syntax at compilation time.
Rationale:
Once you make the variable a const the compiler knows that the value of the variable size is not supposed to change anytime during the execution of the program. So compiler will apply its optimization & just simply replace the const integral type size with its constant value -2 wherever it is being used at compile time, While doing so it understands that legal syntax is not being folowed[1] and complains with a error.
In the Second case not adding a const prevents the compiler from applying the optimization mentioned above, because it cannot be sure that the variable size will never change during the course of execution of the program.So it cannot detect the illegal syntax at all.However, You end up getting a Undefined Behavior at run-time.
[1]Reference:
C++03 5.3.4 New [expr.new]
noptr-new-declarator:
[ expression ] attribute-specifier-seqopt
noptr-new-declarator [ constant-expression ] attribute-specifier-seqopt
constant-expression are further explained in 5.4.3/6 and 5.4.3/7:
Every constant-expression in a noptr-new-declarator shall be an integral constant expression (5.19) and evaluate to a strictly positive value. The expression in a direct-new-declarator shall have integral or enumeration type (3.9.1) with a non-negative value. [Example: if n is a variable of type int, then new float[n][5] is well-formed (because n is the expression of a direct-new-declarator), but new float[5][n] is ill-formed (because n is not a constant-expression). If n is negative, the effect of new float[n][5] is undefined. ]