Try as I might, the closest answer I've seen is this, with two completely opposing answers(!)
The question is simple, is this legal?
auto p = reinterpret_cast<int*>(0xbadface);
*p; // legal?
My take on the matter
Casting integer to pointer: no restrictions on what may be casted
Indirection: only states the result is a lvalue.
Lifetimes: only states what can't be done on objects, there is no object here
Expression statements: *p is a discarded value expression
Discarded value expressions: no lvalue-to-rvalue conversion occurs
Undefined-ness of lvalues: aka strict aliasing rule, only if the lvalue is converted to a rvalue
So I conclude there is nothing explicitly saying this is undefined behaviour. Yet I distinctively remember that some platforms trap on indirection for invalid pointers. What went wrong with my reasoning?
[basic.compound] says:
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value ([conv.ptr]) for that type, or
an invalid pointer value.
By the process of elimination we can deduce that p is an invalid pointer value.
[basic.stc] says:
Indirection through an invalid pointer value and passing an invalid
pointer value to a deallocation function have undefined behavior. Any
other use of an invalid pointer value has implementation-defined
behavior.
As indirection operator is said to perform indirection by [expr.unary.op], I would say, that expression *p causes UB no matter if the result is used or not.
... some platforms trap on indirection for invalid pointers.
Most platforms trap on invalid address access. This does not contradict the issue in any way. The question of what happens in *p; boils down to whether an attempt to actually fetch at an invalid address takes place or not.
The question of fetching is very similar to the core issue 232 (indirection through a null pointer). As you have already pointed out, *p; is a discarded value expression, and as such no lvalue-to-rvalue conversion ("fetching") takes place:
Tom Plum:
...it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior.
And subsequently:
Notes from the October 2003 meeting:
We agreed that the approach in the standard seems okay: p = 0; *p; is
not inherently an error. An lvalue-to-rvalue conversion would give it
undefined behavior.
As to whether or not reinterpret_cast<int*>(0xbadface) produces a valid pointer, indeed in implementations with strict pointer safety, it wouldn't be a safely-derived pointer, and as such is invalid and any use of it is UB.
But in case of relaxed pointer safety the resulting pointer is valid (otherwise it would be impossible to use pointers returned from binary libraries and components written in C or other languages).
Related
Is address computation on a null pointer defined behavior in C++? Here's a simple example program.
struct A { int x; };
int main() {
A* p = nullptr;
&(p->x); // is this undefined behavior?
return 0;
}
Thanks.
EDIT Subscripting is covered in this other question.
&(p->x); // is this undefined behavior?
Standard is a bit vague regarding this:
[expr.ref] ... The expression E1->E2 is converted to the equivalent form (*(E1)).E2;
[expr.unary.op] The unary * operator ... the result is an lvalue referring to the object ... to which the expression points.
There is no explicit mention of UB in the section. The quoted rule does appear to conflict with the fact that the null pointer doesn't point to any object. This could be interpreted that yes, behaviour is undefined.
[expr.unary.op] The result of the unary & operator is a pointer to its operand. ... if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object ([intro.memory]).
Again, no designated object exists. Note that at no point is the operand lvalue converted to an rvalue, which would definitely have been UB.
Back in 2000 there was CWG issue to clarify whether indirection through null is undefined. The proposed resolution (2004), that would clarify that indirection through null is not UB, appears to not have been added to the standard so far.
However whether it is or isn't UB doesn't matter much since you don't need to do this. At the very least, the resulting pointer will be invalid and thus useless.
If you were planning to convert the pointer to an integer to get the offset of the member, there is no need to do this because you can instead us the offsetof macro from the standard library, which doesn't have UB.
&(p[1]); // undefined?
Here, behaviour is quite clearly undefined:
[expr.sub] ... The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
[expr.add] When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0 (does not apply)
Otherwise, if P points to an array element (does not apply)
Otherwise, the behavior is undefined.
&(p[0]); // undefined?
As per previous rules, the first option applies:
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
And now we are back to the question of whether indirection through this null is UB. See the beginning of the answer.
Still, doesn't really matter. There is no need to write this, since this is simply unnecessarily complicated way to write sizeof(int) * i (with i being 1 and 0 respectively).
Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)
From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.
Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.
Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.
First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.
Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.
No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).
There are many questions on Stack Overflow that explain that the following is undefined behavior in C++:
MyType* p = nullptr;
p->DoSomething();
but I can't find one that cites the C++ standard. Which part of the C++11 and/or C++14 standards say that this is undefined behavior?
C++14 [expr.ref]/2:
The expression E1->E2 is converted to the equivalent form (*(E1)).E2
C++14 [expr.unary.op]/1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
The pointer does not point to an object, therefore this quote does not define the behaviour of *p. Nowhere else in the standard defines it either, so it is undefined behaviour.
Regarding whether a null pointer can be said to point to an object, N4618 [basic.compound]/3 defines pointer values as:
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object, or
the null pointer value for that type, or
an invalid pointer value.
which indicates that the null pointer value does not point to an object.
The result of some pointer casts are described as unspecified. For example, [expr.static.cast]/13:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” [...] If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A. The result of any other such pointer conversion is unspecified.
My question is: in the case where alignment is not satisfied, what are the possible results?
For example, are the following results permitted?
a null pointer
an invalid pointer value (i.e. pointer which does not point to allocated storage of size T)
a valid pointer to a T in a completely separate part of memory
Code sample for reference:
#include <iostream>
int main(int argc, char **argv)
{
int *b = (int *)"Hello, world"; // (1)
*b = -1; // (2)
std::cout << argc << '\n';
}
Line (1) triggers my above quote from [expr.static.cast]/13 because it is a reinterpret_cast which is covered by [expr.reinterpret.cast]/7 which defines the conversion in terms of static_casting through void *.
If the unspecified result may be an invalid pointer value, then line (1) may cause a hardware trap. (Reference: N4430 which clarifies similar wording that was in C++14 and C++11).
Corollary question: is there any case in which line 1 would cause undefined behaviour? (I don't think so at this stage; since C++14 invalid pointer value reading is implementation-defined or causes a hardware trap).
Also interesting is that line (2) would in most cases be undefined behaviour due to strict aliasing violation (and perhaps other reasons too), however if the unspecified result may be &argc then this program could output -1 without triggering undefined behaviour!
My question is: in the case where alignment is not satisfied, what are the possible results?
As far as I can tell N4303: Pointer safety and placement new partially answers this question, although somewhat indirectly. This paper refers to CWG issue 1412: Problems in specifying pointer conversions which brought about the changes to [expr.static.cast]/13 that you reference, specifically adding:
[...]If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A. The result of any other such pointer conversion is unspecified.[...]
In reference to this change N4303 says (emphasis mine):
Prior to the adoption of the resolution for DR 1412 [CWG1412], the value of bp is unspecified at the point of its initialization and its subsequent passing to operator new via the new-expression. Said pointer may be null, insufficiently aligned or otherwise dangerous to use.
So an unspecified conversion can results in:
A null pointer
An insufficiently aligned pointer
A pointer that is dangerous to use
As far as I understand the wording in 5.2.9 Static cast, the only time the result of a void*-to-object-pointer conversion is allowed is when the void* was a result of the inverse conversion in the first place.
Throughout the standard there is a bunch of references to the representation of a pointer, and the representation of a void pointer being the same as that of a char pointer, and so on, but it never seems to explicitly say that casting an arbitrary void pointer yields a pointer to the same location in memory, with a different type, much like type-punning is undefined where not punning back to an object's actual type.
So while malloc clearly returns the address of suitable memory and so on, there does not seem to be any way to actually make use of it, portably, as far as I have seen.
C++0x standard draft has in 5.2.9/13:
An rvalue of type “pointer to cv1
void” can be converted to an rvalue of
type “pointer to cv2 T,” where T is an
object type and cv2 is the same
cv-qualification as, or greater
cv-qualification than, cv1. The null
pointer value is converted to the null
pointer value of the destination type.
A value of type pointer to object
converted to “pointer to cv void” and
back, possibly with different
cv-qualification, shall have its
original value.
But also note that the cast doesn't necessarily result in a valid object:
std::string* p = static_cast<std::string*>(malloc(sizeof(*p)));
//*p not a valid object
C++03, §20.4.6p2
The contents are the same as the Standard C library header <stdlib.h>, with the following changes: [list of changes that don't apply here]
C99, §7.20.3.3p2-3
(Though C++03 is based on C89, I only have C99 to quote. However, I believe this section is semantically unchanged. §7.20.3p1 may also be useful.)
The malloc function allocates space for an object whose size is specified by size and
whose value is indeterminate.
The malloc function returns either a null pointer or a pointer to the allocated space.
From these two quotes, malloc allocates an uninitialized object and returns a pointer to it, or returns a null pointer. A pointer to an object which you have as a void pointer can be converted to a pointer to that object (first sentence of C++03 §5.2.9p13, mentioned in the previous answer).
This should be less "handwaving", which you complained of, but someone might argue I'm "interpreting" C's definition of malloc as I wish, by, for example, noticing C says "to the allocated space" rather than "to the allocated object". To those people: first realize that "space" and "object" are synonyms in C, and second please file a defect report with the standard committees, because not even I am pedantic enough to continue. :)
I'll give you the benefit of the doubt and believe you got tripped up in the cross-references, cross-interpretation, and sometimes-confused integration between the standards, rather than "space" vs "object".
Throughout the standard there is a bunch of references to the representation of a pointer, and the representation of a void pointer being the same as that of a char pointer,
Yes, indeed.
So while malloc clearly returns the address of suitable memory and so on, there does not seem to be any way to actually make use of it, portably, as far as I have seen.
Of course there is:
void *vp = malloc (1);
char *cp;
memcpy (&cp, &vb, sizeof cp);
*cp = ' ';
There is one tiny problem : it does not work for any other type. :(