Are non-terminating constexpr functions well-defined? - c++

Consider the following code:
constexpr unsigned f(unsigned x)
{
while (x & 1) x *= 3;
return x;
}
int main()
{
char a[f(2)];
char b[f(1)];
}
In case it isn't obvious: for odd integers x, the function f never terminates.
When I compile the above program with clang on coliru, b seems to be a VLA, but not a:
warning: variable length arrays are a C99 feature [-Wvla-extension]
char b[f(1)];
Is there a well-defined limit at which the compiler decides to stop evaluation of a constant expression? Or would it be perfectly fine for a conforming compiler to go into an infinite loop? Does f(1) yield UB?

There are a number of things which means that an expression is
not a core constant expression is
-- an invocation of a constexpr function or a constexpr constructor that would exceed the implementation defined recursion limits;
(fifth point in §5.19/2.). So the limit is implementation
defined.

Related

Constexpr call from within another constexpr

I know, that such questions were asked early (for example non-constexpr calls in constexpr functions), but let's take next code:
consteval int factorial(int n)
{
return n <= 1 ? 1 : (n * factorial(n - 1));
}
factorial(5);
All is OK. We guarantee, that factorial(5) expression is resolved at compile time, because consteval. Right? If so, I think it should mean, that recursive factorial(n - 1) in call factorial(5) is resolved at compile time too. However, we too know, that in declaration int factorial(int n) parameter int n is just a variable, not constexpr. And this influences, if we try to do something like this:
consteval int factorial(int n)
{
// 1
constexpr auto res = factorial(n - 1); // error: ‘n’ is not a constant expression
// 2
return n <= 1 ? 1 : (n * factorial(n - 1)); // hhhhmmmmmm...but all is ok..
}
factorial(5);
What we have?
We call consteval function with literal constant. OK.
Within consteval function we make recursive call to this function with non constexpr local parameter at row 2. And all is OK, though we call consteval function with non-constexpr value. Well, we can suggest, that compiler knows, that base call has been done as right consteval call factorial(5), and the whole final expression (with all internal code of factorial) should be interpreted as consteval. Yes? Or, why? Because...
At row 1 we explicitly make a call as constexpr with non-constexpr value. And we get an error.
My question is next: why for explicit consteval call of factorial(5) compiler makes difference between explicit and implicit constexpr recursion call of factorial? Is it bug or feature?
Let's review what a constant expression is. A core constant expression is an expression which, when evaluated, does not cause one of a long list of "bad" behaviors. A constant expression is a core constant expression whose result is "allowed" by some other rules (not important here). In particular, note that these conditions are heavily non-syntactic: a constant expression are not defined positively by defining what expressions are constant expressions, but negatively by defining what constant expressions can't do.
A result of this definition is that an expression can be a constant expression even it requires the evaluations of many non-constant expressions (even non-core constant expressions). In the definitions
consteval int factorial1(int n) {
if(n == 0) return 1;
else { // making this correct since undefined behavior interferes with constant expressions
/*constexpr*/ auto rec = factorial1(n - 1);
return n * rec;
}
}
consteval int factorial2(int n) {
return n == 0 ? 1 : n * factorial2(n - 1);
}
the factorial1(n - 1) in factorial1 is not a constant expression, so adding constexpr to rec is an error. Similarly, the n == 0 ? 1 : n * factorial2(n - 1) in factorial2 is also not a constant expression. The reason is the same: both of these expressions read the value of (perform lvalue-to-rvalue conversion on) the object n, which did not start lifetime within the expression. But this is fine: the bodies of constexpr/consteval functions are simply not checked for being constant expressions. All constexpr really does is whitelist a function's calls for appearing in constant expressions. And, again, an expression can be constant (like factorial1(5)) even if you need to evaluate a non-constant expression on the way (like factorial(n - 1)). (In this case, when evaluating factorial1(5), the lifetime of the n object that is the parameter to factorial does start its lifetime within the expression being checked, so it can be read during evaluation.)
Two places where an expression will be checked for being a constant expression are initializations of constexpr variables and "non-protected" calls to consteval functions. The first one explains why adding constexpr to rec in factorial1 is an error: you're adding an additional check for a constant expression that is not done in the correct factorial1 function, and this extra check (correctly) fails. This should have answered your point 3.
For your point 2: yes, there's a special "protection" for consteval functions called from other consteval functions. Usually, a call to a consteval function is, right at the point it is written, checked for being a constant expression. As we've been discussing, this check would fail for the calls factorial1(n - 1) and factorial2(n - 1) in the above definitions. There is a special case built into the language to save them: a call to a consteval function in an immediate function context (basically, whose immediately enclosing function is also consteval) is not required to be a constant expression.

constexpr: gcc is trying harder to eval constexpr than clang

I'm using godbolt to see generated code with gcc and clang.
I tried to implement to djb2 hash.
gcc always trying is best to eval constexpr function.
clang is evaluating constexpr only if the variable is constexpr.
Let's see the example:
constexpr int djb2(char const *str)
{
int hash = 5381;
int c = 0;
while ((c = *str++))
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
int main()
{
int i = djb2("hello you :)");
}
With this example, gcc is evaluating a compile time i. But clang at run time.
If I add constexpr to i, clang is evaluating also at compile time.
Do you know if the standard is saying something about that ?
EDIT: thanks to all. So, as I understand, without constexpr the compiler is doing what is want. With constexpr, the compiler is forced to evaluating the constant.
Your program has undefined behavior.
The shift hash << 5 will overflow which has undefined behavior for signed integer types before C++20.
In particular that means that calling your function can never yield a constant expression, which you can verify by adding constexpr to your declaration of i. Both compilers will then have to diagnose the undefined behavior and will tell you about it.
Give hash an unsigned type and your code will actually have well-defined behavior and the expression djb2("hello you :)" will actually be a constant expression that can be evaluated at compile-time, assuming you are using C++14 or later (The loop was not allowed in a constexpr function in C++11.).
This still doesn't require the compiler to actually do the evaluation at compile-time, but then you can force it by adding constexpr to the declaration of i.
"Force" here is relative. Because of the as-if rule and because there is no observable difference between evaluation at compile-time and runtime, the compiler is still not technically required to really do the computation only at compile-time, but it requires the compiler to check the whole calculation for validity, which is basically the same as evaluating it, so it would be unreasonable for the compiler to repeat the evaluation at runtime.
Similarly "can be evaluated at compile-time" is relative as well. Again for the same reasons as above, a compiler can still choose to do the calculations at compile-time even if it is not a constant expression as long as there wouldn't be any observable difference in behavior. This is purely a matter of optimizer quality. In your specific case the program has undefined behavior, so the compilers can choose to do what they want anyway.

Why does a consteval function allow undefined behavior?

There is a very neat property of constant expressions in C++: their evaluation cannot have undefined behavior (7.7.4.7):
An expression e is a core constant expression unless the evaluation of e, following the rules of the abstract machine ([intro.execution]), would evaluate one of the following:
...
an operation that would have undefined behavior as specified in [intro] through [cpp] of this document [ Note: including, for example, signed integer overflow ([expr.prop]), certain pointer arithmetic ([expr.add]), division by zero, or certain shift operations — end note ] ;
Trying to store the value of 13! in a constexpr int indeed yields a nice compile error:
constexpr int f(int n)
{
int r = n--;
for (; n > 1; --n) r *= n;
return r;
}
int main()
{
constexpr int x = f(13);
return x;
}
Output:
9:19: error: constexpr variable 'x' must be initialized by a constant expression
constexpr int x = f(13);
^ ~~~~~
4:26: note: value 3113510400 is outside the range of representable values of type 'int'
for (; n > 1; --n) r *= n;
^
9:23: note: in call to 'f(3)'
constexpr int x = f(13);
^
1 error generated.
(BTW why does the error say "call to 'f(3)'", while it is a call to f(13)?..)
Then, I remove constexpr from x, but make f a consteval. According to the docs:
consteval - specifies that a function is an immediate function, that is, every call to the function must produce a compile-time constant
I do expect that such a program would again cause a compile error. But instead, the program compiles and runs with UB.
Why is that?
UPD: Commenters suggested that this is a compiler bug. I reported it: https://bugs.llvm.org/show_bug.cgi?id=43714
This is a compiler bug. Or, to be more precise, this is an "underimplemented" feature (see the comment in bugzilla):
Yup - seems consteval isn't implemented yet, according to: https://clang.llvm.org/cxx_status.html
(the keyword's probably been added but not the actual implementation support)

Why is 0 == ("abcde"+1) not a constant expression?

Why doesn't the following code compile?
// source.cpp
int main()
{
constexpr bool result = (0 == ("abcde"+1));
}
The compile command:
$ g++ -std=c++14 -c source.cpp
The output:
source.cpp: In function ‘int main()’:
source.cpp:4:32: error: ‘((((const char*)"abcde") + 1u) == 0u)’ is not a constant expression
constexpr bool result = (0 == ("abcde"+1));
~~~^~~~~~~~~~~~~~~
I'm using gcc6.4.
The restrictions on what can be used in a constant expression are defined mostly as a list of negatives. There's a bunch of things you're not allowed to evaluate ([expr.const]/2 in C++14) and certain things that values have to result in ([expr.const]/4 in C++14). This list changes from standard to standard, becoming more permissive with time.
In trying to evaluate:
constexpr bool result = (0 == ("abcde"+1));
there is nothing that we're not allowed to evaluate, and we don't have any results that we're not allowed to have. No undefined behavior, etc. It's a perfectly valid, if odd, expression. Just one that gcc 6.3 happens to disallow - which is a compiler bug. gcc 7+, clang 3.5+, msvc all compile it.
There seems to be a lot of confusion around this question, with many comments suggesting that since the value of a string literal like "abcde" is not known until runtime, you cannot do anything with such a pointer during constant evaluation. It's important to explain why this is not true.
Let's start with a declaration like:
constexpr char const* p = "abcde";
This pointer has some value. Let's say N. The crucial thing is - just about anything you can do to try to observe N during constant evaluation would be ill-formed. You cannot cast it to an integer to read the value. You cannot compare it to a different, unrelated string† (by way of [expr.rel]/4.3):
constexpr char const* q = "hello";
p > q; // ill-formed
p <= q; // ill-formed
p != q; // ok, false
We can say for sure that p != q because wherever it is they point, they are clearly different. But we cannot say which one goes first. Such a comparison is undefined behavior, and undefined behavior is disallowed in constant expressions.
You can really only compare to pointers within the same array:
constexpr char const* a = p + 1; // ok
constexpr char const* b = p + 17; // ill-formed
a > p; // ok, true
Wherever it is that p points to, we know that a points after it. But we don't need to know N to determine this.
As a result, the actual value N during constant evaluation is more or less immaterial.
"abcde" is... somewhere. "abcde"+1 points to one later than that, and has the value "bcde". Regardless of where it points, you can compare it to a null pointer (0 is a null pointer constant) and it is not a null pointer, hence that comparison evaluates as false.
This is a perfectly well-formed constant evaluation, which gcc 6.3 happens to reject.
†Although we simply state by fiat that std::less()(p, q) provides some value that gives a consistent total order at compile time and that it gives the same answer at runtime. Which is... an interesting conundrum.

Is over/underflow an undefined behavior at execution time?

I was reading about undefined behavior, and I'm not sure if it's a compile-time only feature, or if it can occurs at execution-time.
I understand this example well (this is extracted from the Undefined Behavior page of Wikipedia):
An example for the C language:
int foo(unsigned x)
{
int value = 5;
value += x;
if (value < 5)
bar();
return value;
}
The value of x cannot be negative and, given that signed integer overflow is undefined behavior in C, the compiler can assume that at the line of the if check value >= 5. Thus the if and the call to the function bar can be ignored by the compiler since the if has no side effects and its condition will never be satisfied. The code above is therefore semantically equivalent to:
int foo(unsigned x)
{
int value = 5;
value += x;
return value;
}
But this occurs at compilation-time.
What if I write, for example:
void foo(int x) {
if (x + 150 < 5)
bar();
}
int main() {
int x;
std::cin >> x;
foo(x);
}
and then the user type in MAX_INT - 100 ("2147483547", if 32 bits-integer).
There will be an integer overflow, but AFAIK, it is the arithmetic logic unit of the CPU that will make an overflow, so the compiler is not involved here.
Is it still undefined behavior?
If yes, how does the compiler detect the overflow?
The best I could imagine is with the overflow flag of the CPU. If this is the case, does it means that the compiler can do anything he wants if the overflow flag of the CPU is set anytime at execution-time?
Yes but not necessarily in the way I think you might have meant it, that is, if in the machine code there is an addition and at runtime that addition wraps (or otherwise overflows, but on most architectures it would wrap) that is not UB by itself. The UB is solely in the domain of C (or C++). That addition may have been adding unsigned integers or be some sort of optimizations that the compiler can make because it knows the semantics of the target platform and can safely use optimizations that rely on wrapping (but you cannot, unless of course you do it with unsigned types).
Of course that does not at all mean that it is safe to use constructs that "wrap only at runtime", because those code paths are poisoned at compile time as well. For example in your example,
extern void bar(void);
void foo(int x) {
if (x + 150 < 5)
bar();
}
Is compiled by GCC 6.3 targeting x64 to
foo:
cmp edi, -145
jl .L4
ret
.L4:
jmp bar
Which is the equivalent of
void foo(int x) {
if (x < -145)
bar(); // with tail call optimization
}
.. which is the same if you assume that signed integer overflow is impossible (in the sense that it puts an implicit precondition on the inputs to be such that overflow will not happen).
Your analysis of the first example is incorrect. value += x; is equivalent to:
value = value + x;
In this case value is int and x is unsigned, so the usual arithmetic conversion means that value is first converted to unsigned, so we have an unsigned addition which by definition cannot overflow (it has well-defined semantics in accordance with modular arithmetic).
When the unsigned result is assigned back to value, if it is larger than INT_MAX then this is an out-of-range assignment which has implementation-defined behaviour. This is NOT overflow because it is assignment, not an arithmetic operation.
Which optimizations are possible therefore depends on how the implementation defines the behaviour of out-of-range assignment for integers. Modern systems all take the value which has the same 2's complement representation, but historically other systems have done some different things.
So the original example does not have undefined behaviour in any circumstance and the suggested optimization is , for most systems, not possible.
Your second example has nothing to do with your first example since it does not involve any unsigned arithmetic. If x > INT_MAX - 150 then the expression x + 150 causes undefined behaviour due to signed integer overflow. The language definition does not mention ALUs or CPUs so we can be certain that those things are not related to whether or not the behaviour is undefined.
If yes, how does the compiler detect the overflow?
It doesn't have to. Precisely because the behaviour is undefined, it means the compiler is not constrained by having to worry about what happens when there is overflow. It only has to emit an executable that exemplifies the behaviour for the cases which are defined.
In this program those are the inputs in the range [INT_MIN, INT_MAX-150] and so the compiler can transform the comparison to x < -145 because that has the same behaviour for all inputs in the well-defined range, and it doesn't matter about the undefined cases.