Endianness in constexpr

Endianness in constexpr - c++

I want to create a constexpr function that returns the endianness of the system, like so:
constexpr bool IsBigEndian()
{
constexpr int32_t one = 1;
return (reinterpret_cast<const int8_t&>(one) == 0);
}
Now, since the function will get executed at compile time rather than on the actual target machine, what guarantee does the C++ spec give to make sure that the correct result is returned?

None. In fact, the program is ill-formed. From [expr.const]:
A conditional-expression e is a core constant expression unless the evaluation of e, following the rules of the
abstract machine (1.9), would evaluate one of the following expressions:
— [...]
— a reinterpret_cast.
— [...]
And, from [dcl.constexpr]:
For a constexpr function or constexpr constructor that is neither defaulted nor a template, if no argument
values exist such that an invocation of the function or constructor could be an evaluated subexpression of
a core constant expression (5.20), or, for a constructor, a constant initializer for some object (3.6.2), the
program is ill-formed; no diagnostic required.
The way to do this is just to hope that your compiler is nice enough to provide macros for the endianness of your machine. For instance, on gcc, I could use __BYTE_ORDER__:
constexpr bool IsBigEndian() {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
return false;
#else
return true;
#endif
}

As stated by Barry, your code is not legal C++. However, even if you took away the constexpr part, it would still not be legal C++. Your code violates strict aliasing rules and therefore represents undefined behavior.
Indeed, there is no way in C++ to detect the endian-ness of an object without invoking undefined behavior. Casting it to a char* doesn't work, because the standard doesn't require big or little endian order. So while you could read the data through a byte, you would not be able to legally infer anything from that value.
And type punning through a union fails because you're not allowed to type pun through a union in C++ at all. And even if you did... again, C++ does not restrict implementations to big or little endian order.
So as far as C++ as a standard is concerned, there is no way to detect this, whether at compile-time or runtime.

Related

reinterpret_cast usage to manipulate bytes

I was reading here how to use the byteswap function. I don't understand why bit_cast is actually needed instead of using reinterpret_cast to char*. What I understand is that using this cast we are not violating the strict aliasing rule. I read that the second version below could be wrong because we access to unaligned memory. It could but at this point I'm a bit confused because if the access is UB due to unaligned memory, when is it possible to manipulate bytes with reinterpret_cast? According to the standard the cast should allow to access (read/write) the memory.
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
static_assert(std::has_unique_object_representations_v<T>,
"T may not have padding bits");
auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
std::ranges::reverse(value_representation);
return std::bit_cast<T>(value_representation);
}
template<std::integral T>
void byteswap(T& value) noexcept
{
static_assert(std::has_unique_object_representations_v<T>,
"T may not have padding bits");
char* value_representation = reinterpret_cast<char*>(value);
std::reverse(value_representation, value_representation+sizeof(T));
}

The primary reason is that reinterpret_cast can not be used in constant expression evaluation, while std::bit_cast can. And std::byteswap is specified to be constexpr.
If you added constexpr to the declaration in your implementation, it would be ill-formed, no diagnostic required, because there is no specialization of it that could be called as subexpression of a constant expression.
Without the constexpr it is not ill-formed, but cannot be called as subexpression of a constant expression, which std::byteswap is supposed to allow.
Furthermore, there is a defect in the standard:
The standard technically does not allow doing pointer arithmetic on the reinterpret_cast<char*>(value) pointer (and also doesn't really specify a meaning for reading and writing through such a pointer).
The intention is that the char* pointer should be a pointer into the object representation of the object, considered as an array of characters. But currently the standard just says that the reinterpret_cast<char*>(value) pointer still points to the original object, not its object representation. See P1839 for a paper proposing to correct the specification to be more in line with the usual assumptions.
The implementation from cppreference is also making an assumption that might not be guaranteed to be true: Whether std::array<std::byte, sizeof(T)> is guaranteed to have the same size as T. Of course that should hold in practice and std::bit_cast will fail to compile if it doesn't.
If you want to read some discussion on whether or not it is guaranteed in theory, see the questions std::bit_cast with std::array, Is the size of std::array defined by standard and What is the sizeof std::array<char, N>?

Does implicit object creation apply in constant expressions?

#include <memory>
int main() {
constexpr auto v = [] {
std::allocator<char> a;
auto x = a.allocate(10);
x[2] = 1;
auto r = x[2];
a.deallocate(x, 10);
return r;
}();
return v;
}
Is the program ill-formed? Clang thinks so, GCC and MSVC don't: https://godbolt.org/z/o3bcbxKWz
Removing the constexpr I think the program is not ill-formed and has well-defined behavior:
By [allocator.members]/5 the call a.allocate(10) starts the lifetime of the char[10] array it allocates storage for.
According to [intro.object]/13 starting the lifetime of an array of type char implicitly creates objects in its storage.
Scalar types such as char are implicit lifetime types. ([basic.types.general]/9
[intro.object]/10 then says that objects of type char are created in the storage of the char[10] array (and their lifetime started) if that can give the program defined behavior.
Without beginning the lifetime of the char object at x[2], the program without constexpr would have undefined behavior due to the write to x[2] outside its lifetime, but the char object can be implicitly created due to the arguments above, making the program behavior well-defined to exit with status 1.
With constexpr, I am wondering if the program is ill-formed or not. Does implicit object creation apply in constant expressions?
According to [intro.object]/10 objects are implicitly created to give the program defined behavior, but does being ill-formed count as defined behavior?
If not, then the program should not be ill-formed because of implicit creation of the char object for x[2].
If yes, then the next question would be if it is unspecified whether the program is ill-formed or not, because [intro.object]/10 also says that it is unspecified which objects are implicitly created if multiple sets can give the program defined behavior.
From a language design perspective I would expect that implicit object creation is not supposed to happen in constant expressions, because verifying the (non-)existence of a set of objects making the constant expression valid is probably infeasible for a compiler in general.

2469. Implicit object creation vs constant expressions
It is not intended that implicit object creation, as described in 6.7.2 [intro.object] paragraph 10, should occur during constant expression evaluation, but there is currently no wording prohibiting it.

Clang is incorrect here. You've already cited all the parts in the spec that make it well-formed. std::allocator<T>::allocate is constexpr; you get a pointer to char*; allocator<T>::allocate creates an array of T; creating an array of char implicitly creates objects; accessing a char attempts to cause UB, but IOC prevents UB, so UB doesn't happen. Therefore: the code isn't allowed to be il-formed.
Clang claims full support for both IOC and constexpr allocators, so this code should work.
Does implicit object creation apply in constant expressions?
All expressions are core constant expressions unless [expr.const]/5 explicitly excludes it. Nothing there mentions operations which might be UB that determines which objects are created, so such operations must be included.
IOC prevents an expression from being UB.
I would expect that implicit object creation is not supposed to happen in constant expressions, because verifying the (non-)existence of a set of objects making the constant expression valid is probably infeasible for a compiler in general.
You have forgotten about the other restrictions on constexpr code. So long as [expr.const]/5 continues to explicitly forbid reinterpret_cast and conversions from void*, the number of ways you can abuse IOC is pretty limited. You cannot, for example, take the pointer returned by your allocate(10) call and convert it to an int*. So the compiler knows that the only objects that can be implicitly created in that storage are chars.
So at constant evaluation time, the compiler could just take the result of allocator<char>::allocate and create all the char members of that array immediately before returning it. There's no constexpr-valid way for you to take that storage and implicitly create anything other than chars.
And using allocator<T>::allocate when T isn't a byte-wise type will not implicitly create objects in that storage. So either you're just getting a pointer to an array of unformed elements, or you're getting a pointer to an array of byte-wise types.
I'd guess Clang forgot to check this particular case.

constexpr: gcc is trying harder to eval constexpr than clang

I'm using godbolt to see generated code with gcc and clang.
I tried to implement to djb2 hash.
gcc always trying is best to eval constexpr function.
clang is evaluating constexpr only if the variable is constexpr.
Let's see the example:
constexpr int djb2(char const *str)
{
int hash = 5381;
int c = 0;
while ((c = *str++))
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
int main()
{
int i = djb2("hello you :)");
}
With this example, gcc is evaluating a compile time i. But clang at run time.
If I add constexpr to i, clang is evaluating also at compile time.
Do you know if the standard is saying something about that ?
EDIT: thanks to all. So, as I understand, without constexpr the compiler is doing what is want. With constexpr, the compiler is forced to evaluating the constant.

Your program has undefined behavior.
The shift hash << 5 will overflow which has undefined behavior for signed integer types before C++20.
In particular that means that calling your function can never yield a constant expression, which you can verify by adding constexpr to your declaration of i. Both compilers will then have to diagnose the undefined behavior and will tell you about it.
Give hash an unsigned type and your code will actually have well-defined behavior and the expression djb2("hello you :)" will actually be a constant expression that can be evaluated at compile-time, assuming you are using C++14 or later (The loop was not allowed in a constexpr function in C++11.).
This still doesn't require the compiler to actually do the evaluation at compile-time, but then you can force it by adding constexpr to the declaration of i.
"Force" here is relative. Because of the as-if rule and because there is no observable difference between evaluation at compile-time and runtime, the compiler is still not technically required to really do the computation only at compile-time, but it requires the compiler to check the whole calculation for validity, which is basically the same as evaluating it, so it would be unreasonable for the compiler to repeat the evaluation at runtime.
Similarly "can be evaluated at compile-time" is relative as well. Again for the same reasons as above, a compiler can still choose to do the calculations at compile-time even if it is not a constant expression as long as there wouldn't be any observable difference in behavior. This is purely a matter of optimizer quality. In your specific case the program has undefined behavior, so the compilers can choose to do what they want anyway.

Comparing two constexpr pointers is not constexpr?

I am looking for a way to map types to numeric values at compile time, ideally without using a hash as proposed in this answer.
Since pointers can be constexpr, I tried this:
struct Base{};
template<typename T> struct instance : public Base{};
template<typename T>
constexpr auto type_instance = instance<T>{};
template<typename T>
constexpr const Base* type_pointer = &type_instance<T>;
constexpr auto x = type_pointer<int> - type_pointer<float>; // not a constant expression
Both gcc and clang reject this code because type_pointer<int> - type_pointer<float> is not a constant expression, see here, for instance.
Why though?
I can understand that the difference between both values is not going to be stable from one compilation to the next, but within one compilation, it should be constexpr, IMHO.

Subtraction of two non-null pointers which do not point into the same array or to the same object (including one-past-the-array/object) is undefined behavior, see [expr.add] (in particular paragraph 5 and 7) of the C++17 standard (final draft).
Expressions which would have core undefined behavior[1] if evaluated are never constant expressions, see [expr.const]/2.6.
Therefore type_pointer<int> - type_pointer<float> cannot be a constant expression, because the two pointers are to unrelated objects.
Since type_pointer<int> - type_pointer<float> is not a constant expression, it cannot be used to initialize a constexpr variable such as
constexpr auto x = type_pointer<int> - type_pointer<float>;
Trying to use a non-constant expression as initializer to a constexpr variable makes the program ill-formed and requires the compiler to print a diagnostic message. This is what the error message you are seeing is.
Basically compilers are required to diagnose core undefined behavior when it appears in purely compile-time contexts.
You can see that there will be no error if the pointers are to the same object, e.g.:
constexpr auto x = type_pointer<int> - type_pointer<int>;
Here the subtraction is well-defined and the initializer is a constant expression. So the code will compile (and won't have undefined behavior). x will have a well-defined value of 0.
Be aware that if you make x non-constexpr the compiler won't be required to diagnose the undefined behavior and to print a diagnostic message anymore. It is therefore likely to compile.
Subtracting unrelated pointers is still undefined behavior though, not only unspecified behavior. Therefore you will loose any guarantee on what the resulting program will do. It does not only mean that you will get different values for x in each compilation/execution of the code.
[1] Core undefined behavior here refers to undefined behavior in the core language, in contrast with undefined behavior due to use of the standard library. It is unspecified whether undefined behavior as specified for the library causes an (otherwise constant) expression to not be a constant expression, see final sentence before the example in [expr.const]/2.

Are user-defined-literals resolved at compile-time or runtime?

I wonder, because predefined literals like ULL, f, etc. are obviously resolved at compile time. The standard (2.14.8 [lex.ext]) doesn't seem to define this, but it seems to tend towards runtime:
[2.14.8 / 2]
A user-defined-literal is treated as a call to a literal operator or literal operator template (13.5.8). To
determine the form of this call for a given user-defined-literal L with ud-suffix X, the literal-operator-id
whose literal suffix identifier is X is looked up in the context of L using the rules for unqualified name
lookup (3.4.1). Let S be the set of declarations found by this lookup. S shall not be empty.
(emphasis mine.)
However, to me this seems to introduce unnecessary runtime-overhead, as literals can only be appended to values that are available at compile-time anyways like 13.37f or "hello"_x (where _x is a user-defined-literal).
Then, we got the templated user-defined-literal, that never really gets defined in the standard AFAICS (i.e., no example is given, please prove me wrong). Is that function somehow magically invoked at compile time or is it still runtime?

Yes, you get a function call. But function calls can be compile time constant expressions because of constexpr literal operator functions.
For an example, see this one. As another example to show the advanced form of constexpr computations allowed by the FDIS, to have compile time base-26 literals you can do
typedef unsigned long long ull;
constexpr ull base26(char const *s, ull ps) {
return (*s && !(*s >= 'a' && *s <= 'z')) ? throw "bad char!" :
(!*s ? ps : base26(s + 1, (ps * 26ULL) + (*s - 'a')));
}
constexpr ull operator "" _26(char const *s, std::size_t len) {
return base26(s, 0);
}
Saying "bcd-"_26 will evaluate a throw-expression, and thereby cause the return value to become non-constant. In turn, it causes any use of "bcd-"_26 as a constant expression to become ill-formed, and any non-constant use to throw at runtime. The allowed form "bcd"_26 evaluates to a constant expression of the respective computed value.
Note that reading from string literals is not explicitly allowed by the FDIS, however it presents no problem and GCC supports this (the character lvalue reference is a constant expression and the character's value is known at compile time). IMO if one squints, one can read the FDIS as if this is allowed to do.
Then, we got the templated user-defined-literal, that never really gets defined in the standard AFAICS (i.e., no example is given, please prove me wrong)
The treatment of literals as invoking literal operator templates is defined in 2.14.8. You find more examples at 13.5.8 that detail on the literal operator function/function templates itself.
Is that function somehow magically invoked at compile time or is it still runtime?
The keyword is function invocation substitution. See 7.1.5.

#Johannes S is correct of course, but I'd like to add clearly (since I faced this), that even for constexpr user defined literals, the parameters are not considered constexpr or compile time constant, for example in the sense that they can not be used as integer constants for templates.
In addition, only things like this will actually give compile-time evaluation:
inline constexpr long long _xx(unsigned long long v) {
return (v > 100 ) ? throw std::exception() : v;
}
constexpr auto a= 150_xx;
So, that will not compile. But this will:
cout << 150_xx << endl;
And the following is not allowed:
inline constexpr long long _xx(unsigned long long v) {
return some_trait<v>::value;
}
That's annoying, but natural considering that (other) constexpr functions can be called also during execution.
Only for integer user-defined literals is it possible to force compile-time processing, by using the template form. Examples in my question and self answer: https://stackoverflow.com/a/13869688/1149664

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js