Does implicit object creation apply in constant expressions?

Does implicit object creation apply in constant expressions? - c++

#include <memory>
int main() {
constexpr auto v = [] {
std::allocator<char> a;
auto x = a.allocate(10);
x[2] = 1;
auto r = x[2];
a.deallocate(x, 10);
return r;
}();
return v;
}
Is the program ill-formed? Clang thinks so, GCC and MSVC don't: https://godbolt.org/z/o3bcbxKWz
Removing the constexpr I think the program is not ill-formed and has well-defined behavior:
By [allocator.members]/5 the call a.allocate(10) starts the lifetime of the char[10] array it allocates storage for.
According to [intro.object]/13 starting the lifetime of an array of type char implicitly creates objects in its storage.
Scalar types such as char are implicit lifetime types. ([basic.types.general]/9
[intro.object]/10 then says that objects of type char are created in the storage of the char[10] array (and their lifetime started) if that can give the program defined behavior.
Without beginning the lifetime of the char object at x[2], the program without constexpr would have undefined behavior due to the write to x[2] outside its lifetime, but the char object can be implicitly created due to the arguments above, making the program behavior well-defined to exit with status 1.
With constexpr, I am wondering if the program is ill-formed or not. Does implicit object creation apply in constant expressions?
According to [intro.object]/10 objects are implicitly created to give the program defined behavior, but does being ill-formed count as defined behavior?
If not, then the program should not be ill-formed because of implicit creation of the char object for x[2].
If yes, then the next question would be if it is unspecified whether the program is ill-formed or not, because [intro.object]/10 also says that it is unspecified which objects are implicitly created if multiple sets can give the program defined behavior.
From a language design perspective I would expect that implicit object creation is not supposed to happen in constant expressions, because verifying the (non-)existence of a set of objects making the constant expression valid is probably infeasible for a compiler in general.

2469. Implicit object creation vs constant expressions
It is not intended that implicit object creation, as described in 6.7.2 [intro.object] paragraph 10, should occur during constant expression evaluation, but there is currently no wording prohibiting it.

Clang is incorrect here. You've already cited all the parts in the spec that make it well-formed. std::allocator<T>::allocate is constexpr; you get a pointer to char*; allocator<T>::allocate creates an array of T; creating an array of char implicitly creates objects; accessing a char attempts to cause UB, but IOC prevents UB, so UB doesn't happen. Therefore: the code isn't allowed to be il-formed.
Clang claims full support for both IOC and constexpr allocators, so this code should work.
Does implicit object creation apply in constant expressions?
All expressions are core constant expressions unless [expr.const]/5 explicitly excludes it. Nothing there mentions operations which might be UB that determines which objects are created, so such operations must be included.
IOC prevents an expression from being UB.
I would expect that implicit object creation is not supposed to happen in constant expressions, because verifying the (non-)existence of a set of objects making the constant expression valid is probably infeasible for a compiler in general.
You have forgotten about the other restrictions on constexpr code. So long as [expr.const]/5 continues to explicitly forbid reinterpret_cast and conversions from void*, the number of ways you can abuse IOC is pretty limited. You cannot, for example, take the pointer returned by your allocate(10) call and convert it to an int*. So the compiler knows that the only objects that can be implicitly created in that storage are chars.
So at constant evaluation time, the compiler could just take the result of allocator<char>::allocate and create all the char members of that array immediately before returning it. There's no constexpr-valid way for you to take that storage and implicitly create anything other than chars.
And using allocator<T>::allocate when T isn't a byte-wise type will not implicitly create objects in that storage. So either you're just getting a pointer to an array of unformed elements, or you're getting a pointer to an array of byte-wise types.
I'd guess Clang forgot to check this particular case.

Related

Is reading a variable outside its lifetime during constant evaluation diagnosable?

Shall one expect a reliable failure of in constant evaluation if it reads a variable outside of its lifetime?
For example:
constexpr bool g() {
int * p = nullptr;
{
int c = 0;
p = &c;
}
return *p == 0;
};
int main() {
static_assert( g() );
}
Here Clang stops with the error
read of object outside its lifetime is not allowed in a constant expression
But GCC accepts the program silently (Demo).
Are both compilers within their rights, or GCC must fail the compilation as well?

GCC dropped the ball.
[expr.const]
5 An expression E is a core constant expression unless the
evaluation of E, following the rules of the abstract machine
([intro.execution]), would evaluate one of the following:
...
an operation that would have undefined behavior as specified in [intro] through [cpp];
...
Indirection via dangling pointer has undefined behavior.
[basic.stc.general]
4 When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
So the invocation of g() may not be a constant expression, and may not appear in the condition of a static_assert which must be constant evaluated.
The program is ill-formed.
The above quotes are from the C++20 standard draft, but C++17 has them too.

Shall one expect a reliable failure of in constant evaluation if it reads a variable outside of its lifetime?
Yes, but your example doesn't necessarily do this. Its behavior is implementation-defined.
When the block with the variable c exits ([basic.stc.auto]/1), the value of p becomes an invalid pointer value ([basic.stc.general]/4).
When *p is evaluated, the lvalue-to-rvalue conversion ([conv.lval]) is applied to p. And [conv.lval]/3 says:
The result of the conversion is determined according to the following rules:
...
— Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
So.
Are both compilers within their rights, or GCC must fail the compilation as well?
AFAIK neither of the implementations define its behavior here, but I think it could theoretically be defined in such a way that neither the conversion nor the rest of the evaluation would make g() not a constant expression.

Do all transient allocations have unique address?

While reading comments of a C++ Weekly video about the constexpr new support in C++20 I found the comment that alleges that C++20 allows UB in constexpr context.
At first I was convinced that comment is right, but more I thought about it more and more I began to suspect that C++20 wording contains some clever language that makes this defined behavior.
Either that all transient allocations return unique addresses or maybe some more general notion in C++ that makes 2 distinct allocation pointers always(even in nonconstexpr context) compare false even if at runtime in reality it is possible that allocator would give you back same address(since you deleted the first allocation).
As a bonus weirdness: you can only use == for comparison, <, > fail...
Here is the program with alleged UB in constexpr:
#include <iostream>
static constexpr bool f()
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
int main()
{
constexpr bool res1 = f();
std::cout << res1 << std::endl; // May output 0 or 1
}
godbolt

The result here is implementation-defined. res1 could be false, true, or ill-formed, based on how the implementation wants to define it. And this is just as true for equality comparison as it is for relational comparison.
Both [expr.eq] (for equality) and [expr.rel] (for relational) start by doing an lvalue-to-rvalue conversion on the pointers (because we have to actually read what the value is to do a comparison). [conv.lval]/3 says that the result of that conversion is:
Otherwise, if the object to which the glvalue refers contains an invalid pointer value ([basic.stc.dynamic.deallocation], [basic.stc.dynamic.safety]), the behavior is implementation-defined.
That is the case here: both pointers contain an invalid pointer value, as per [basic.stc.general]/4:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
with a footnote reading:
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.
So the value we get out of the lvalue-to-rvalue conversion is... implementation-defined. It could be implementation-defined in a way that causes those two pointers to compare equal. It could be implementation-defined in a way that causes those two pointers to compare not equal (as apparently all implementations do). Or it could even be implementation-defined in a way that causes the comparison between those two pointers to be unspecified or undefined behavior.
Notably, [expr.const]/5 (the main rule governing constant expressions), despite rejecting undefined behavior and explicitly rejecting any comparison whose result is unspecified ([expr.const]/5.23), says nothing about a comparison whose result is implementation-defined.
There's no undefined behavior here. Anything goes. Which is admittedly very weird during constant evaluation, where we'd expect to see a stricter set of rules.
Notably, with p < q, it appears that gcc and clang reject the comparison as being not a constant expression (which is... an allowed result) while msvc considers both p < q and p > q to be constant expressions whose value is false (which is... also an allowed result).

Why does it use the term "object" here when mentioning a prvalue?

As far as I know,in c++17 the concept/semantic of prvalue is no longer temporary object,so in many circumstances the copy elision is mandated.
However, today I came cross a description of return expression
If expression is a prvalue, the object returned by the function is initialized directly by that expression. This does not involve a copy or move constructor when the types match
Why does the term object occurs here? In value category the return of function which is not a reference type belongs to prvalue,so I think maybe it is inappropriate to use the term object.
According to my understanding,the prvalues are no longer objects now,they are just values,am I right?
As a supplement,here also use the term "object".

I agree with what you say. There is a discussion page on cppreference where you can raise your concerns. A better way to phrase it might be
If expression is a prvalue, the result object of the expression is initialized directly by that expression.
As you say, objects are no longer returned or passed around by prvalues.

The quoted text is pointing out that there is an object whose value is set to the return value of the function: with C++17's guaranteed elision when returning by value, that object will be something like a variable being created by the caller, an element in a vector the caller is emplacing or push_backing too, or an object being constructed in some dynamically allocated memory the caller's orchestrated. You're confusing this with a temporary, which as you say might not be involved.
Working through it systematically, you quote cppreference saying of...
return expression;
...that...
If expression is a prvalue, the object returned by the function is initialized directly by that expression. This does not involve a copy or move constructor when the types match
The C++17 Standard says in [basic.lval]:
The result of a prvalue is the value that the expression stores into its context. [ ... ] The result object of a prvalue is the object initialized by the prvalue; a prvalue that is used to compute the value of an operand of an operator or that has type cv void has no result object. [ Note: Except when the prvalue is the operand of a decltype-specifier, a prvalue of class or array type always has a result object. For a discarded prvalue, a temporary object is materialized; ...
So, in the cppreference text, what the standard terms the "result object" is being referred to as "the object returned by the function". The language is a little bit imprecise in saying the result object is "returned" rather than "initialised", but overall it's not misleading and - for having avoided yet another bit of terminology - may well be easier for most cppreference readers to understand. I'm not actively involved with the site, but as a regular user my impression is that cppreference is trying to accurately explain the essence of the standard but simplifying language a smidge when possible.
While the underlying mechanisms are left unspecified by the Standard - and the practicalities of optimisation / ABIs dictate different implementation - to get a sense of what the Standard requires happen functionally it might help to imagine the compiler implementing code like...
My_Object function() { return { a, b, c }; }
...
... {
My_Object my_object = function(); // caller
}
...by secretly passing a memory-for-returned-object address to the function (much like the this pointer to member functions)...
void function(My_Object* p_returned_object) {
new (p_returned_object) My_Object{ a, b, c };
}
So, there is an object involved and constructed by the called function, but its whereabouts is elided - i.e. a caller-specified memory address. If the function call result is unused by the caller, a temporary is at least notionally constructed then destructed.

What is the significance of special language in standard for lvalue-to-rvalue conversions for unsigned character types of indeterminate value

In the C++14 standard (n3797), the section on lvalue to rvalue conversions reads as follows (emphasis mine):
4.1 Lvalue-to-rvalue-conversion [conv.lval]
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise the type of the prvalue is T.
When an lvalue-to-rvalue conversion occurs in an unevaluated operand
or a subexpression thereof (Clause 5) the value contained in the
referenced object is not accessed. In all other cases, the result of the
conversion is determined according to the following rules:
If T is a (possibly cv-qualified) std::nullptr_t then the result is a null pointer constant.
Otherwise, if T has class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.
Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
Otherwise, if T is a (possibly cv-qualified) unsigned character type, and the object to which the glvalue refers contains an indeterminate value, and that object does not have automatic storage duration or the glvalue was the operand of a unary & operator or it was bound to a reference, the result is an unspecified value.
Otherwise, if the object to which the glvalue refers has an indeterminate value, the behavior is undefined.
Otherwise, the object indicated by the glvalue is the prvalue result.
[Note: See also 3.10]
What's the significance of this paragraph (in bold)?
If this paragraph were not here, then the situations in which it applies would lead to undefined behavior. Normally, I would expect that accessing an unsigned char value while it has an indeterminate value leads to undefined behavior. But, with this paragraph it means that
If I'm not actually accessing the character value, i.e. I'm immediately passing it to & or binding it to a reference, or
If the unsigned char does not have automatic storage duration,
then the conversion yields an unspecified value, and not undefined behavior.
Am I correct to conclude that this program:
#include <new>
#include <iostream>
// using T = int;
using T = unsigned char;
int main() {
T * array = new T[500];
for (int i = 0; i < 500; ++i) {
std::cout << static_cast<int>(array[i]) << std::endl;
}
delete[] array;
}
is well-defined by the standard, and must output a sequence of 500 unspecified ints, while the same program where T = int, would have undefined behavior?
IIUC, one of the reasons to make it UB to read things with indeterminate values, is to allow aggressive dead store elimination by the optimizer. So, this paragraph may mean that a conforming compiler can't do as much optimization when working with unsigned char or arrays of unsigned char.
Assuming I understand correctly, what is the rationale for this rule? When is it useful to be able to read unsigned char that have indeterminate values, and get unspecified results instead of UB? I have this feeling that if they put this much effort into crafting this part of the rule, they had some motivation to help certain code examples that they cared about, or to be consistent with some other part of the standard, or simplify some other issue. But I have no idea what that might be.

In many situations, code will write some parts of a PODS or array without writing everything, and then use functions like memcpy or fwrite to copy or write the entire thing without regard for which parts had assigned values and which did not. Although it is not terribly common for C++ code to use byte-based operations to copy or write out the contents of aggregates, the ability to do so is a fundamental part of the language. Requiring that a program write definite values to all portions of an object, including those nothing will ever "care" about, would needlessly impair efficiency.

Aliasing T* with char* is allowed. Is it also allowed the other way around?

Note: This question has been renamed and reduced to make it more focused and readable. Most of the comments refer to the old text.
According to the standard, objects of different type may not share the same memory location. So this would not be legal:
std::array<short, 4> shorts;
int* i = reinterpret_cast<int*>(shorts.data()); // Not OK
The standard, however, allows an exception to this rule: any object may be accessed through a pointer to char or unsigned char:
int i = 0;
char * c = reinterpret_cast<char*>(&i); // OK
However, it is not clear to me whether this is also allowed the other way around. For example:
char * c = read_socket(...);
unsigned * u = reinterpret_cast<unsigned*>(c); // huh?

Some of your code is questionable due to the pointer conversions involved. Keep in mind that in those instances reinterpret_cast<T*>(e) has the semantics of static_cast<T*>(static_cast<void*>(e)) because the types that are involved are standard-layout. (I would in fact recommend that you always use static_cast via cv void* when dealing with storage.)
A close reading of the Standard suggests that during a pointer conversion to or from T* it is assumed that there really is an actual object T* involved -- which is hard to fulfill in some of your snippet, even when 'cheating' thanks to the triviality of types involved (more on this later). That would be besides the point however because...
Aliasing is not about pointer conversions. This is the C++11 text that outlines the rules that are commonly referred to as 'strict aliasing' rules, from 3.10 Lvalues and rvalues [basic.lval]:
10 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
(This is paragraph 15 of the same clause and subclause in C++03, with some minor changes in the text with e.g. 'lvalue' being used instead of 'glvalue' since the latter is a C++11 notion.)
In the light of those rules, let's assume that an implementation provides us with magic_cast<T*>(p) which 'somehow' converts a pointer to another pointer type. Normally this would be reinterpret_cast, which yields unspecified results in some cases, but as I've explained before this is not so for pointers to standard-layout types. Then it's plainly true that all of your snippets are correct (substituting reinterpret_cast with magic_cast), because no glvalues are involved whatsoever with the results of magic_cast.
Here is a snippet that appears to incorrectly use magic_cast, but which I will argue is correct:
// assume constexpr max
constexpr auto alignment = max(alignof(int), alignof(short));
alignas(alignment) char c[sizeof(int)];
// I'm assuming here that the OP really meant to use &c and not c
// this is, however, inconsequential
auto p = magic_cast<int*>(&c);
*p = 42;
*magic_cast<short*>(p) = 42;
To justify my reasoning, assume this superficially different snippet:
// alignment same as before
alignas(alignment) char c[sizeof(int)];
auto p = magic_cast<int*>(&c);
// end lifetime of c
c.~decltype(c)();
// reuse storage to construct new int object
new (&c) int;
*p = 42;
auto q = magic_cast<short*>(p);
// end lifetime of int object
p->~decltype(0)();
// reuse storage again
new (p) short;
*q = 42;
This snippet is carefully constructed. In particular, in new (&c) int; I'm allowed to use &c even though c was destroyed due to the rules laid out in paragraph 5 of 3.8 Object lifetime [basic.life]. Paragraph 6 of same gives very similar rules to references to storage, and paragraph 7 explains what happens to variables, pointers and references that used to refer to an object once its storage is reused -- I will refer collectively to those as 3.8/5-7.
In this instance &c is (implicitly) converted to void*, which is one of the correct use of a pointer to storage that has not been yet reused. Similarly p is obtained from &c before the new int is constructed. Its definition could perhaps be moved to after the destruction of c, depending on how deep the implementation magic is, but certainly not after the int construction: paragraph 7 would apply and this is not one of the allowed situations. The construction of the short object also relies on p becoming a pointer to storage.
Now, because int and short are trivial types, I don't have to use the explicit calls to destructors. I don't need the explicit calls to the constructors, either (that is to say, the calls to the usual, Standard placement new declared in <new>). From 3.8 Object lifetime [basic.life]:
1 [...] The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
the storage which the object occupies is reused or released.
This means that I can rewrite the code such that, after folding the intermediate variable q, I end up with the original snippet.
Do note that p cannot be folded away. That is to say, the following is defintively incorrect:
alignas(alignment) char c[sizeof(int)];
*magic_cast<int*>(&c) = 42;
*magic_cast<short*>(&c) = 42;
If we assume that an int object is (trivially) constructed with the second line, then that must mean &c becomes a pointer to storage that has been reused. Thus the third line is incorrect -- although due to 3.8/5-7 and not due to aliasing rules strictly speaking.
If we don't assume that, then the second line is a violation of aliasing rules: we're reading what is actually a char c[sizeof(int)] object through a glvalue of type int, which is not one of the allowed exception. By comparison, *magic_cast<unsigned char>(&c) = 42; would be fine (we would assume a short object is trivially constructed on the third line).
Just like Alf, I would also recommend that you explicitly make use of the Standard placement new when using storage. Skipping destruction for trivial types is fine, but when encountering *some_magic_pointer = foo; you're very much likely facing either a violation of 3.8/5-7 (no matter how magically that pointer was obtained) or of the aliasing rules. This means storing the result of the new expression, too, since you most likely can't reuse the magic pointer once your object is constructed -- due to 3.8/5-7 again.
Reading the bytes of an object (this means using char or unsigned char) is fine however, and you don't even to use reinterpret_cast or anything magic at all. static_cast via cv void* is arguably fine for the job (although I do feel like the Standard could use some better wording there).

This too:
// valid: char -> type
alignas(int) char c[sizeof(int)];
int * i = reinterpret_cast<int*>(c);
That is not correct. The aliasing rules state under which circumstances it is legal/illegal to access an object through an lvalue of a different type. There is an specific rule that says that you can access any object through a pointer of type char or unsigned char, so the first case is correct. That is, A => B does not necessarily mean B => A. You can access an int through a pointer to char, but you cannot access a char through a pointer to int.
For the benefit of Alf:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.

Regarding the validity of …
alignas(int) char c[sizeof(int)];
int * i = reinterpret_cast<int*>(c);
The reinterpret_cast itself is OK or not, in the sense of producing a useful pointer value, depending on the compiler. And in this example the result isn't used, in particular, the character array isn't accessed. So there is not much more that can be said about the example as-is: it just depends.
But let's consider an extended version that does touch on the aliasing rules:
void foo( char* );
alignas(int) char c[sizeof( int )];
foo( c );
int* p = reinterpret_cast<int*>( c );
cout << *p << endl;
And let's only consider the case where the compiler guarantees a useful pointer value, one that would place the pointee in the same bytes of memory (the reason that this depends on the compiler is that the standard, in §5.2.10/7, only guarantees it for pointer conversions where the types are alignment-compatible, and otherwise leave it as "unspecified" (but then, the whole of §5.2.10 is somewhat inconsistent with §9.2/18).
Now, one interpretation of the standard's §3.10/10, the so called "strict aliasing" clause (but note that the standard does not ever use the term "strict aliasing"),
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
is that, as it itself says, concerns the dynamic type of the object residing in the c bytes.
With that interpretation, the read operation on *p is OK if foo has placed an int object there, and otherwise not. So in this case, a char array is accessed via an int* pointer. And nobody is in any doubt that the other way is valid: even though foo may have placed an int object in those bytes, you can freely access that object as a sequence of char values, by the last dash of §3.10/10.
So with this (usual) interpretation, after foo has placed an int there, we can access it as char objects, so at least one char object exists within the memory region named c; and we can access it as int, so at least that one int exists there also; and so David’s assertion in another answer that char objects cannot be accessed as int, is incompatible with this usual interpretation.
David's assertion is also incompatible with the most common use of placement new.
Regarding what other possible interpretations there are, that perhaps could be compatible with David's assertion, well, I can't think of any that make sense.
So in conclusion, as far as the Holy Standard is concerned, merely casting oneself a T* pointer to the array is practically useful or not depending on the compiler, and accessing the pointed to could-be-value is valid or not depending on what's present. In particular, think of a trap representation of int: you would not want that blowing up on you, if the bitpattern happened to be that. So to be safe you have to know what's in there, the bits, and as the call to foo above illustrates the compiler can in general not know that, like, the g++ compiler's strict alignment-based optimizer can in general not know that…

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js