Constant evaluation of self-assignment in member initialization - c++

In the following program, constexpr function foo() makes an object of A with the field x=1, then constructs another object on top of it using std::construct_at and default initialization x=x, then the constant evaluated value is printed:
#include <memory>
#include <iostream>
struct A {
int x = x;
};
constexpr int foo() {
A a{1};
std::construct_at<A>(&a);
return a.x;
}
constexpr int v = foo();
int main() {
std::cout << v;
}
GCC prints 1 here. Both Clang and MSVC print 0. And only Clang issues a warning: field 'x' is uninitialized when used. Demo: https://gcc.godbolt.org/z/WTsxdrj8e
Is there an undefined behavior in the program? If yes, why does no compiler detect it during constant evaluation? If no, which compiler is right?

C++20 [basic.life]/1.5 states that the lifetime of an object (in this case, the object a) ends when
the storage which the object occupies is released, or is reused by an object that is not nested within o (6.7.2).
The standard isn't totally clear about when exactly the memory is considered "reused" (and thus, the old A's lifetime ends) but [intro.object]/1 states that
... An object occupies a region of storage in its period of construction (11.10.5), throughout its lifetime (6.7.3), and in its period of destruction (11.10.5).
In my opinion, the evaluation of the default member initializer = x is something that happens during the "period of construction" of the new A object, and that means that at that point, the new A object has already come into existence (but its lifetime has not yet begun), and the old object's lifetime has already ended. That means the initialization of the new A reads the value of its x member, whose lifetime has not begun because its initialization is not complete, which violates [basic.life]/7.1 and would be UB.
In C++20, the definition of foo violates [dcl.constexpr]/6:
A constexpr function that is neither defaulted nor a template is ill-formed, no diagnostic required, if it is not possible for an evaluation of an invocation of the function to be performed while evaluating any valid manifestly constant-evaluated expression.
This means compilers are not required to issue a diagnostic for your program.
In C++23, this rule will be abolished (see P2448) so you can argue that compilers must issue a diagnostic if they claim C++23 compliance. However, no compiler has ever been able to diagnose all kinds of core language UB in constant expressions (for example, something that seems particularly difficult to diagnose is unsequenced writes or an unsequenced read and write involving the same scalar object) so don't hold your breath for it to be fixed.

Related

constexpr result from non-constexpr call

Recently I was surprised that the following code compiles in clang, gcc and msvc too (at least with their current versions).
struct A {
static const int value = 42;
};
constexpr int f(A a) { return a.value; }
void g() {
A a; // Intentionally non-constexpr.
constexpr int kInt = f(a);
}
My understanding was that the call to f is not constexpr because the argument i isn't, but it seems I am wrong. Is this a proper standard-supported code or some kind of compiler extension?
As mentioned in the comments, the rules for constant expressions do not generally require that every variable mentioned in the expression and whose lifetime began outside the expression evaluation is constexpr.
There is a (long) list of requirements that when not satisfied prevent an expression from being a constant expression. As long as none of them is violated, the expression is a constant expression.
The requirement that a used variable/object be constexpr is formally known as the object being usable in constant expressions (although the exact definition contains more detailed requirements and exceptions, see also linked cppreference page).
Looking at the list you can see that this property is required only in certain situations, namely only for variables/objects whose lifetime began outside the expression and if either a virtual function call is performed on it, a lvalue-to-rvalue conversion is performed on it or it is a reference variable named in the expression.
Neither of these cases apply here. There are no virtual functions involved and a is not a reference variable. Typically the lvalue-to-rvalue conversion causes the requirement to become important. An lvalue-to-rvalue conversions happens whenever you try to use the value stored in the object or one of its subobjects. However A is an empty class without any state and therefore there is no value to read. When passing a to the function, the implicit copy constructor is called to construct the parameter of f, but because the class is empty, it doesn't actually do anything. It doesn't access any state of a.
Note that, as mentioned above, the rules are stricter if you use references, e.g.
A a;
A& ar = a;
constexpr int kInt = f(ar);
will fail, because ar names a reference variable which is not usable in constant expressions. This will hopefully be fixed soon to be more consistent. (see https://github.com/cplusplus/papers/issues/973)

Why is `std::is_constant_evaluated()` false for this constant-initialized variable?

Note 2 to [expr.const]/2 implies that if we have a variable o such that:
the full-expression of its initialization is a constant expression when interpreted as a constant-expression, except that if o is an object, that full-expression may also invoke constexpr constructors for o and its subobjects even if those objects are of non-literal class types
then:
Within this evaluation, std​::​is_­constant_­evaluated() [...] returns true.
Consider:
#include <type_traits>
int main() {
int x = std::is_constant_evaluated();
return x;
}
This program returns 0 when executed.
However, I don't see how the full-expression of the initialization of x is not a constant expression. I do not see anything in [expr.const] that bans it. Therefore, my understanding of the note (which is probably wrong) implies that the program should return 1.
Now, if we look at the normative definition of std::is_constant_evaluated, it is only true in a context that is "manifestly constant-evaluated", and the normative definition of the latter, [expr.const]/14, is more clear that the program above should return 0. Specifically, the only item that we really need to look at is the fifth one:
the initializer of a variable that is usable in constant expressions or has constant initialization ...
x is not usable in constant expressions, and it doesn't have constant initialization because no automatic variable does.
So there are two possibilities here. The more likely one is that I haven't understood the note, and I need someone to explain to me why the note does not imply that the program should return 1. The less likely one is that the note contradicts the normative wording.
The full quote here is
A variable or temporary object o is constant-initialized if
(2.1) either it has an initializer or its default-initialization results in some initialization being performed, and
(2.2) the full-expression of its initialization is a constant expression when interpreted as a constant-expression, except that if
o is an object, that full-expression may also invoke constexpr
constructors for o and its subobjects even if those objects are of
non-literal class types. [Note 2: Such a class can have a
non-trivial destructor. Within this evaluation,
std​::​is_­constant_­evaluated() ([meta.const.eval]) returns true.
— end note]
The tricky bit here is that the term "is constant-initialized" (note: not "has constant initialization") doesn't mean anything by itself (it probably should renamed to something else). It's used in exactly three other places, two of which I'll quote below, and the last one ([dcl.constexpr]/6) isn't really relevant.
[expr.const]/4:
A constant-initialized potentially-constant variable V is usable in constant expressions at a point P if V's initializing declaration D is reachable from P and [...].
[basic.start.static]/2:
Constant initialization is performed if a variable or temporary object with static or thread storage duration is constant-initialized ([expr.const]).
Let's replace "constant-initialized" with something less confusing, like "green".
So
A green potentially-constant variable is usable in constant expressions if [some conditions are met]
Constant initialization is performed if a variable or temporary object with static or thread storage duration is green.
Outside of these two cases, the greenness of a variable doesn't matter. You can still compute whether it is green, but that property has no effect. It's an academic exercise.
Now go back to the definition of greenness, which says that a variable or temporary object is green if (among other things) "the full-expression of its initialization is a constant expression when interpreted as a constant-expression" with some exceptions. And the note says that during this hypothetical evaluation to determine the green-ness of the variable, is_constant_evaluated() returns true - which is entirely correct.
So going back to your example:
int main() {
int x = std::is_constant_evaluated();
return x;
}
Is x green? Sure, it is. But it doesn't matter. Nothing cares about its greenness, since x is neither static nor thread local nor potentially-constant. And the hypothetical computation done to determine whether x is green has nothing to do with how it is actually initialized, which is governed by other things in the standard.

Contradictory definitions about the Order of Constant Initialization and Zero Initialization in C++

I have been trying to understand how static variables are initialized. And noted a contradiction about the order of constant initialization and zero initialization at cppref and enseignement.
At cppref it says:
Constant initialization is performed instead of zero initialization of the static and thread-local (since C++11) objects and before all other initialization.
Whereas at enseignement it says:
Constant initialization is performed after zero initialization of the static and thread-local objects and before all other initialization.
So as you can see cppref uses "instead" while the second site uses "after". Which of the two is correct? That is, does zero initialization always happen first and then if possible constant initialization as implied by the second site or the other way round.
The example given there is as follows:
#include <iostream>
#include <array>
struct S {
static const int c;
};
const int d = 10 * S::c; // not a constant expression: S::c has no preceding
// initializer, this initialization happens after const
const int S::c = 5; // constant initialization, guaranteed to happen first
int main()
{
std::cout << "d = " << d << '\n';
std::array<int, S::c> a1; // OK: S::c is a constant expression
// std::array<int, d> a2; // error: d is not a constant expression
}
This is my understanding of the initialization process so far:
Static initialization happens first. This includes
Constant initialization if possible
Zero initialization only if constant initialization was not done
Dynamic Initialization
Now according to the above(my understanding) this is how the code above works:
Step 1.
When the control flow reaches the definition of const int d it sees that the initializer has a variable(namely S::c) that has not been already initialized. So the statement const int d = 10 * S::c; is a dynamic time(runtime) initialization. This means it can only happen after static initialization. And ofcourse d is not a constant expression.
Step 2.
The control flow reaches the definition of variable const int S::c; . In this case however the initializer is a constant expression and so constant initialization can happen. And there is no need for zero initialization.
Step 3.
But note that we(the compiler) still haven't initialized the variable d because it left its initialization because it has to be done dynamically. So now this will take place and d will get value 50. But note d still isn't a constant expression and hence we cannot use it where a constant expression is required.
Is my analysis/understanding of the concept correct and the code behaves as described?
Note:
The order of constant initialization and zero initialization is also different at cppref-init and enseignement-init.
When in doubt, turn to the standard. As this question is tagged with C++11, we'll refer to N3337.
[basic.start.init]/2:
Variables with static storage duration ([basic.stc.static]) or thread
storage duration ([basic.stc.thread]) shall be zero-initialized
([dcl.init]) before any other initialization takes place.
Constant initialization is performed: [...]
Together, zero-initialization and constant initialization are called
static initialization; all other initialization is dynamic
initialization. Static initialization shall be performed before any
dynamic initialization takes place.
Thus, with regard to the C++11 Standard, enseignement's description is accurate.
Constant initialization is performed after zero initialization of the static and thread-local objects and before all other initialization.
However, this was flagged as a defect as per CWG 2026:
CWG agreed that constant initialization should be considered as happening instead of zero initialization in these cases, making the declarations ill-formed.
And as of C++17 (N4659) this was changed, and henceforth governed by [basic.start.static]/2:
[...] Constant initialization is performed if a variable or temporary object with static or thread storage duration is initialized by a constant initializer for the entity. If constant initialization is not performed, a variable with static storage duration or thread storage duration is zero-initialized. Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization.
But as this was not just new standard feature update, but a defect, it backports to older standard, and in the end cppreference's description is accurate also for C++11 (in fact all the way back to C++98), whereas enseignement's has not taken into account neither modern C++ standard nor the DR CWG 2026.
I have never visited enseignement's page myself, but after a quick glance at their reference page, it looks like a really old version of cppreference itself, either entirely unattributed to cppreference, or cppreference actually started as a collab with the enseignement authors.
Short of the standard itself I consider cppreference to be a de facto reference, and given the old copy-pasta of enseignement I would recommend never to turn to those for pages reference again. And indeed, we may visit the actual up-to-date cppreference page on constant initialization, scroll to the very bottom and read:
Defect reports
The following behavior-changing defect reports were applied
retroactively to previously published C++ standards.
[...]
CWG 2026
Applied to: C++98
Behavior as published: zero-init was specified to always occur first, even before constant-init
Correct behavior: no zero-init if constant init applies

Guarantee of deferred dynamic initialization of non odr-used global variable

Consider the following complete program consisting of two TU's:
// 1.cpp
bool init() { /* ... */ }
const auto _{init()};
// 2.cpp
int main() {}
Question: is there any guarantee that _ is initialized at some point (I do not care when)?
Now consider the program consisting of one TU:
// 1.cpp
bool init() { /* ... */ }
const auto _{init()};
int main() {}
Note that _ is not odr-used.
However, can main(), in the second case, be said to be odr-used, since it gets (sort of) "referred by the implementation" as it gets called when the program is run?
And if main() is odr-used, does this imply that _ is guaranteed to be initialized even if it's not odr-used?
EDIT:
This is what en.cppreference.com says about Deferred dynamic initialization:
If no variable or function is odr-used from a given translation unit,
the non-local variables defined in that translation unit may never be
initialized (this models the behavior of an on-demand dynamic library)
Can you answer my questions considering the above when reading my two examples?
It's supposedly the linker's job to collate all objects with static storage-duration from all translation units for initialization during program initiation - however, its a bit more than that, the guarantee is that those objects will be initialized before the use of any function within that translation unit.
basic.start.static/1: Variables with static storage duration are initialized as a
consequence of program initiation....
Also see:
basic.stc.static/2: If a variable with static storage duration has initialization or a
destructor with side effects, it shall not be eliminated even if it
appears to be unused...
The object _ is guaranteed to be initialized. According to [basic.start.static]/1,
Variables with static storage duration are initialized as a consequence of program initiation. Variables with
thread storage duration are initialized as a consequence of thread execution.
In case you were wondering whether that could be read only as guaranteeing that static initialization shall occur, and not guaranteeing that dynamic initialization shall occur, see [dcl.dcl]/11,
A definition causes the appropriate amount of storage to be
reserved and any appropriate initialization (11.6) to be done.
Thus, all initialization required by the semantics of the initializer {init()} shall be performed on the object _.
As usual, the as-if rule applies. If init() has any observable behaviour, such behaviour must occur. It has any side effects that affect observable behaviour, such side effects must occur.
The fact that _ is not odr-used is irrelevant. The tangent about main is irrelevant too.

Can initializing expression use the variable itself?

Consider the following code:
#include <iostream>
struct Data
{
int x, y;
};
Data fill(Data& data)
{
data.x=3;
data.y=6;
return data;
}
int main()
{
Data d=fill(d);
std::cout << "x=" << d.x << ", y=" << d.y << "\n";
}
Here d is copy-initialized from the return value of fill(), but fill() writes to d itself before returning its result. What I'm concerned about is that d is non-trivially used before being initialized, and use of uninitialized variables in some(all?) cases leads to undefined behavior.
So is this code valid, or does it have undefined behavior? If it's valid, will the behavior become undefined once Data stops being POD or in some other case?
This does not seem like valid code. It is similar to the case outlined in the question: Is passing a C++ object into its own constructor legal?, although in that case the code was valid. The mechanics are not identical but the base reasoning can at least get us started.
We start with defect report 363 which asks:
And if so, what is the semantics of the self-initialization of UDT?
For example
#include <stdio.h>
struct A {
A() { printf("A::A() %p\n", this); }
A(const A& a) { printf("A::A(const A&) %p %p\n", this, &a); }
~A() { printf("A::~A() %p\n", this); }
};
int main()
{
A a=a;
}
can be compiled and prints:
A::A(const A&) 0253FDD8 0253FDD8
A::~A() 0253FDD8
and the proposed resolution was:
3.8 [basic.life] paragraph 6 indicates that the references here are valid. It's permitted to take the address of a class object before it
is fully initialized, and it's permitted to pass it as an argument to
a reference parameter as long as the reference can bind directly.
[...]
So although d is not fully initialized we can pass it as a reference.
Where we start to get into trouble is here:
data.x=3;
The draft C++ standard section 3.8(The same section and paragraph the defect report quotes) says (emphasis mine):
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the
object, or
the glvalue is bound to a reference to a virtual base class (8.5.3), or
the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
So what does access mean? That was clarified with defect report 1531 which defines access as:
access
to read or modify the value of an object
So fill accesses a non-static data member and hence we have undefined behavior.
This also agrees with section 12.7 which says:
[...]To form a pointer to (or
access the value of) a direct non-static member of an object obj, the construction of obj shall have started
and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing
the member value) results in undefined behavior.
Since you are using a copy anyway you might as well create an instance of Data inside of fill and initialize that. The you avoid having to pass d.
As pointed out by T.C. it is important to explicitly quote the details on when lifetime starts. From section 3.8:
The lifetime of an object is a runtime property of the object. An
object is said to have non-trivial initialization if it is of a class
or aggregate type and it or one of its members is initialized by a
constructor other than a trivial default constructor. [ Note:
initialization by a trivial copy/move constructor is non-trivial
initialization. — end note ] The lifetime of an object of type T
begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
The initialization is non-trivial since we are initializing via the copy constructor.
I don't see a problem. Accessing the uninitialized integer members is valid, because you're accessing for the purpose of writing. Reading them would cause UB.
I think it is valid ( crazy, but valid ).
This would be both legal and logically acceptable :
Data d ;
d = fill( d ) ;
and the fact is that this form is the same :
Data d = fill( d ) ;
As far as the logical structure of the language is concerned those two versions are equivalent.
So it's legal and logically correct for the language.
However, as we normally expect people to initialize variables to a default when we created them ( for safety ), it is bad programming practice.
It is interesting that g++ -Wall compiles this code without a blurp.