While reading comments of a C++ Weekly video about the constexpr new support in C++20 I found the comment that alleges that C++20 allows UB in constexpr context.
At first I was convinced that comment is right, but more I thought about it more and more I began to suspect that C++20 wording contains some clever language that makes this defined behavior.
Either that all transient allocations return unique addresses or maybe some more general notion in C++ that makes 2 distinct allocation pointers always(even in nonconstexpr context) compare false even if at runtime in reality it is possible that allocator would give you back same address(since you deleted the first allocation).
As a bonus weirdness: you can only use == for comparison, <, > fail...
Here is the program with alleged UB in constexpr:
#include <iostream>
static constexpr bool f()
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
int main()
{
constexpr bool res1 = f();
std::cout << res1 << std::endl; // May output 0 or 1
}
godbolt
The result here is implementation-defined. res1 could be false, true, or ill-formed, based on how the implementation wants to define it. And this is just as true for equality comparison as it is for relational comparison.
Both [expr.eq] (for equality) and [expr.rel] (for relational) start by doing an lvalue-to-rvalue conversion on the pointers (because we have to actually read what the value is to do a comparison). [conv.lval]/3 says that the result of that conversion is:
Otherwise, if the object to which the glvalue refers contains an invalid pointer value ([basic.stc.dynamic.deallocation], [basic.stc.dynamic.safety]), the behavior is implementation-defined.
That is the case here: both pointers contain an invalid pointer value, as per [basic.stc.general]/4:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
with a footnote reading:
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.
So the value we get out of the lvalue-to-rvalue conversion is... implementation-defined. It could be implementation-defined in a way that causes those two pointers to compare equal. It could be implementation-defined in a way that causes those two pointers to compare not equal (as apparently all implementations do). Or it could even be implementation-defined in a way that causes the comparison between those two pointers to be unspecified or undefined behavior.
Notably, [expr.const]/5 (the main rule governing constant expressions), despite rejecting undefined behavior and explicitly rejecting any comparison whose result is unspecified ([expr.const]/5.23), says nothing about a comparison whose result is implementation-defined.
There's no undefined behavior here. Anything goes. Which is admittedly very weird during constant evaluation, where we'd expect to see a stricter set of rules.
Notably, with p < q, it appears that gcc and clang reject the comparison as being not a constant expression (which is... an allowed result) while msvc considers both p < q and p > q to be constant expressions whose value is false (which is... also an allowed result).
Related
Shall one expect a reliable failure of in constant evaluation if it reads a variable outside of its lifetime?
For example:
constexpr bool g() {
int * p = nullptr;
{
int c = 0;
p = &c;
}
return *p == 0;
};
int main() {
static_assert( g() );
}
Here Clang stops with the error
read of object outside its lifetime is not allowed in a constant expression
But GCC accepts the program silently (Demo).
Are both compilers within their rights, or GCC must fail the compilation as well?
GCC dropped the ball.
[expr.const]
5 An expression E is a core constant expression unless the
evaluation of E, following the rules of the abstract machine
([intro.execution]), would evaluate one of the following:
...
an operation that would have undefined behavior as specified in [intro] through [cpp];
...
Indirection via dangling pointer has undefined behavior.
[basic.stc.general]
4 When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
So the invocation of g() may not be a constant expression, and may not appear in the condition of a static_assert which must be constant evaluated.
The program is ill-formed.
The above quotes are from the C++20 standard draft, but C++17 has them too.
Shall one expect a reliable failure of in constant evaluation if it reads a variable outside of its lifetime?
Yes, but your example doesn't necessarily do this. Its behavior is implementation-defined.
When the block with the variable c exits ([basic.stc.auto]/1), the value of p becomes an invalid pointer value ([basic.stc.general]/4).
When *p is evaluated, the lvalue-to-rvalue conversion ([conv.lval]) is applied to p. And [conv.lval]/3 says:
The result of the conversion is determined according to the following rules:
...
— Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
So.
Are both compilers within their rights, or GCC must fail the compilation as well?
AFAIK neither of the implementations define its behavior here, but I think it could theoretically be defined in such a way that neither the conversion nor the rest of the evaluation would make g() not a constant expression.
int main(){
int v = 1;
char* ptr = reinterpret_cast<char*>(&v);
char r = *ptr; //#1
}
In this snippet, the expression ptr point to an object of type int, as per:
expr.static.cast#13
Otherwise, the pointer value is unchanged by the conversion.
Indirection ptr will result in a glvalue that denotes the object ptr point to, as per
expr.unary#op-1
the result is an lvalue referring to the object or function to which the expression points.
Access an object by using a glvalue of the permitted type does not result in UB, as per
basic.lval#11
If a program attempts to access ([defns.access]) the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:
a char, unsigned char, or std::byte type.
It seems it also does not violate the following rule:
expr#pre-4
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
Assume the width of char in the test circumstance is 8 bits, its range is [-128, 127]. The value of v is 1. So, Does it mean the snippet at #1 does not result in UB?
As a contrast, given the following example
int main(){
int v = 2147483647; // or any value greater than 127
char* ptr = reinterpret_cast<char*>(&v);
char r = *ptr; //#2
}
#2 would be UB, Right?
It is the intention of the language that both snippets be implementation defined. I believe they were, until to C++17 which broke support for that language feature. See the defect report here. As far as I know, this has not been fixed in C++20.
Currently, the portable workaround for accessing memory representation is to use std::memcpy (example) :
#include <cstring>
char foo(int v){
return *reinterpret_cast<char*>(&v);
}
char bar(int v)
{
char buffer[sizeof(v)];
std::memcpy(buffer, &v, sizeof(v));
return *buffer;
}
foo is technically UB while bar is well defined. The reason is foo is UB is by omission. Anything the standard fails to define is by definition UB and the standard, in its current state, fails to define the behavior of this code.
bar produces the same assembly as foo with gcc 10. For simple cases, the actual copy is optimized out.
Regarding your rational, the reasoning seems sound except that, in my opinion, the rules defining unary operator* (expr.static.cast#13) doesn't have the effect you expect in this case. The pointer must point to the underlying representation, which is poorly defined as the linked defect describes. The fact that the pointer's value doesn't change does not mitigate the fact that it points to a different object. C++ allows objects to have the same address if their types are different, such as the first member in a standard layout class sharing the same address as the owning instance.
Note that the author is the defect report came to the same conclusion as you regarding snippet #1, but I disagree. But due to the fact that we are dealing with a language defect, and one that conflicts with state intentions, it is hard to definitively prove one behavior correct. The fundamental rules these arguments would be based on are known to be flawed in this particular case.
Does it mean the snippet at #1 does not result in UB?
Yes, the quoted rules mean that #1 is well defined.
#2 would be UB, Right?
No, as per the quoted rules, the behaviour of #2 is also well defined.
The type of ptr is char*, therefore the type of the expression *ptr is char whose value cannot exceed the value representable by char, thus expr#pre-4 does not apply.
Assume the width of char in the test circumstance is 8 bits, its range is [-128, 127].
This assumption is not necessary in order for #1 to be well defined.
The value of v is 1
This does not follow from the above assumption alone. It may be practically true in case of a little endian CPU (including the previous assumptions) although the standard doesn't specify the representation exactly.
Let's say I perform the following:
void g(int* x)
{
int y = 0;
auto diff = uintptr_t(&y) - uintptr_t(x);
}
void f()
{
int x = 0;
g(&x);
}
Does diff merely have undefined value, or does the code invoke undefined behaviour? According to the specification, is the code guaranteed to run nicely and compute a value for diff, possibly meaningless, or does it invoke UB? I believe there's something about unrelated variables, but could not pinpoint it.
I'm interested in answers regarding any standard since (including) C++ 11.
Discussion arose from comments in: Print stack in C++
To quote the C++11 standard draft. On the subject of converting a pointer to an integer
[expr.reinterpret.cast]
5 A value of integral type or enumeration type can be explicitly
converted to a pointer. A pointer converted to an integer of
sufficient size (if any such exists on the implementation) and back to
the same pointer type will have its original value; mappings between
pointers and integers are otherwise implementation-defined.
Since uintptr_t must be defined for the your code to compile, then there exists an integer type on the target machine capable of being the target of the pointer-to-integer conversion. The mapping is implementation defined, but most importantly the result is not indeterminate. This means you obtain some valid integer for both conversions.
So the subtraction is not undefined behavior. But the result is implementation defined.
Converting pointer to integer of sufficient size is well defined, subtracting unsigned integer from another is well defined regardless of their value. There is no undefined behaviour here.
But also, standard doesn't guarantee any particular value for the converted integers, and therefore neither for the result of their subtraction.
If you know two pieces of information:
A memory address.
The type of the object stored in that address.
Then you logically have all you need to reference that object:
#include <iostream>
using namespace std;
int main()
{
int x = 1, y = 2;
int* p = (&x) + 1;
if ((long)&y == (long)p)
cout << "p now contains &y\n";
if (*p == y)
cout << "it also dereference to y\n";
}
However, this isn't legal per the C++ standard. It works in several compilers I tried, but it's Undefined Behavior.
The question is: why?
It wreaks havoc with optimizations.
void f(int* x);
int g() {
int x = 1, y = 2;
f(&x);
return y;
}
If you can validly "guess" the address of y from x's address, then the call to f may modify y and so the return statement must reload the value of y from memory.
Now consider a typical function with more local variables and more calls to other functions, where you'd have to save the value of every variable to memory before each call (because the called function may inspect them) and reload them after each call (because the called function may have modified them).
If you want to treat pointers as a numeric type, firstly you need to use std::uintptr_t, not long. That's the first undefined behavior, but not the one you're talking about.
It works in several compilers I tried, but it's Undefined Behavior.
The question is: why?
Okay, so the comments section went off when I called this undefined behavior. It's actually unspecified behavior (a.k.a. implementation defined).
You are trying to compare two distinctly unrelated pointers:
&x + 1
&y
The pointer &x+1 is a one-past-the-end pointer. The standard allows you to have such a pointer, but the behavior is only defined when you use it to compare against pointers based on x. The behavior is not specified if you compare it with anything else: [expr.eq § 3.1]
The compiler is free to put y anywhere it chooses, including in a register. As such, there is no guarantee that &y and &x+1 are related.
As an exercise to someone who wants to show whether this is in fact undefined behavior or not, perhaps start here:
[basic.stc.dynamic.safety § 3.4]:
An integer value is an integer representation of a safely-derived pointer only if its type is at least as large as std::intptr_t and it is one of the following: ...
3.4 the result of an additive or bitwise operation, one of whose operands is an integer representation of a safely-derived pointer value P, if that result converted by reinterpret_cast would compare equal to a safely-derived pointer computable from reinterpret_cast(P).
[basic.compound § 3.4] :
Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address
If you know address and type of an object and your implementation has relaxed pointer safety [basic.stc.dynamic.safety §4], then it should be legal to just access the object at that address through an appropriate lvalue I think.
The problem is that the standard does not guarantee that local variables of the same type are allocated contiguously with addresses increasing in order of declaration. So you cannot derive the address of y based on that computation you do with the address of x. Apart from that, pointer arithmetic would lead to undefined behavior if you go more than one element past an object ([expr.add]). So while (&x) + 1 is not undefined behavior yet, just the act of even computing (&x) + 2 would be…
The code is legal per the C++ standard (i.e. should compile), but as you already noted the behaviour is undefined. This is because the order of variable declaration does not imply that they will be arranged in memory in the same way.
Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)
From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.
Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.
Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.
First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.
Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.
No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).