Why can't you take the address of nullptr? - c++

In the C++11 standard, I don't understand the reason why taking the address of nullptr is disallowed whereas one is allowed to take the address of their own std::nullptr_t instances. Aside from the fact that nullptr is a reserved keyword, is there any designated reasoning for this decision?
Simply because it amuses me, I attempted to circumnavigate this restriction with the following function:
decltype(nullptr)* func(const decltype(nullptr) &nref) noexcept
{
return const_cast<decltype(nullptr)*>(reinterpret_cast<const decltype(nullptr)*>(&nref));
}
I had to use reinterpret_cast on the parameter because without it I was getting the hysterical error:
error: invalid conversion from 'std::nullptr_t*' to 'std::nullptr_t*' [-fpermissive]
When I call this function by passing nullptr directly I get a different address each time. Is nullptr dynamically assigned an address just-in-time for comparisons and such? Or (probably more likely) perhaps is the compiler forcing a temporary copy of the underlying object?
Of course none of this is vital information, I just find it interesting why this particular restriction was implemented (and subsequently why I am seeing the behavior I am).

It's the same as not being able to take the address of 5 even though you can take the address of an int after giving it the value 5. It doesn't matter that there's no alternative value for a nullptr_t to have.
Values don't have addresses; objects do.
A temporary object is generated when you pass such a value to a const & parameter, or otherwise bind a value to a const reference, such as by static_cast< T const & >( … ) or declaring a named reference T const & foo = …;. The address you're seeing is that of the temporary.

If you're after a standard answer, § 18.2/9 puts your observations pretty bluntly:
Although nullptr’s address cannot be taken, the address of another nullptr_t object that is an lvalue can
be taken.
Alternatively, § 2.14.7 says this about nullptr:
The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t.
So what is a prvalue? § 3.10/1 answers that:
A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [ Example: The result of calling a function
whose return type is not a reference is a prvalue. The value of a literal such as 12, 7.3e5, or true is
also a prvalue. — end example ]
Hopefully, trying to take the address of any of those things in the example will make more sense as to why you can't take the address of nullptr. It's part of those examples!

nullptr is a (literal) constant, and these don't have a memory address, like any other literal constant in your code. It's similar to 0, but of the special std::nullptr_t type instead of void* to avoid problems with overloading (pointers vs. integers).
But if you define your own variable with the value nullptr, it has a memory address, so you can take its address.
The same holds for any other literal constant (which in C++ fall under the category prvalue) of any other type, since literal constants aren't stored in your program (only as parts of expressions where they occur), that's why it doesn't make any sense to talk about addresses. However, constant variables do have addresses, to point out the difference.

Both true and false are keywords and as literals they have a type ( bool ). nullptr is a pointer literal of type std::nullptr_t, and it's a prvalue (you cannot take the address of it using &), also nullptr is prvalue so you can't take its address,literal constants are not stored in your program.
It doesn't make sense to have address.

Related

What is the idiom here when static_cast is used on null pointer?

Static analysis flagged this code as nullPointerArithmetic:
static_cast<BYTE*>(NULL) + p_row_fields->offsets.back(), // field offset
where NULL is defined as #define NULL 0
and offsets resolves through typedef to std::vector<int>
The line in question is passed as a BYTE* argument to a method call.
My question is - What could be the purpose of this idiom?
Is there any difference between what's shown and the less eclectic direct cast:
static_cast<BYTE*>(p_row_fields->offsets.back())
Null pointer constant converts to any pointer type, resulting in null pointer value of that pointer type. Thus, static_cast<BYTE*>(NULL) yields a null pointer of type BYTE*. This conversion works implicitly as well.
Strictly speaking, the behaviour of the pointer arithmetic on null is undefined by the standard because (or assuming that) there is no array at the null address.
As far as what the behaviour might be in practice assuming the compiler allows this, one might expect it to behave same as :
reinterpret_cast<Byte*>(
static_cast<std::intptr_t>(
p_row_fields->offsets.back()
)
)
While this is not UB, there is still technically no standard guarantee that the resulting address is what was intended.
Is there any difference between what's shown and the less eclectic direct cast:
static_cast<BYTE*>(p_row_fields->offsets.back())
Yes, there is a difference. Of integer expressions, only compile time constant prvalues with value 0 are convertible to pointer types. Values other than 0, and lvalues such as p_row_fields->offsets.back() can not be static-casted to pointers.
As such, the quoted cast is ill-formed.

Why can't casting an address to int* be an lvalue but casting to a struct pointer can?

I suspect this is true for all primitive types in C/C++.
For example, if you do this:
((unsigned int*)0x1234) = 1234;
The compiler will not let it pass. Whereas if you do this
((data_t*)0x1234 )->s = 1234;
where data_t is a struct, the compiler allows it.
This seems to be the case for at least two compilers I experimented on, one ARM GCC, one TDM-GCC.
Why is this?
The first code snippet doesn't work because the left hand side is not an lvalue. It is only a pointer value, and pointers by themselves are not lvalues.
The second code snippet works because a pointer is being dereferenced, and a dereferenced pointer is an lvalue. It may not be immediately clear from the syntax this is the case, so let's rewrite this:
((data_t*)0x1234 )->s = 1234;
As:
(*(data_t*)0x1234).s = 1234;
Now we can see that the value which is casted to a pointer is dereferenced to an lvalue of struct type, and a member of that struct is subsequently accessed and assigned to.
This is described in section 6.5.2.3p4 of the C standard regarding the -> operator:
A postfix expression followed by the -> operator and an identifier
designates a member of a structure or union object. The value
is that of the named member of the object to which the first
expression points, and is an lvalue. If the first expression is a
pointer to a qualified type, the result has the so-qualified
version of the type of the designated member.
Regarding the first snippet, section 6.5.4p5 regarding the typecast operator states:
Preceding an expression by a parenthesized type name converts
the value of the expression to the named type. This construction is
called a cast. 104) A cast that specifies no conversion has
no effect on the type or value of an expression.
Where footnote 104 states:
A cast does not yield an lvalue. Thus, a cast to a qualified
type has the same effect as a cast to the unqualified version
of the type.
So this describes why the first snippet won't compile but the second snippet will.
However, treating an arbitrary value as a pointer and dereferencing it is implementation defined behavior at best, and most likely undefined behavior.
Your examples are:
((unsigned int*)0x1234) = 1234;
((data_t*)0x1234 )->s = 1234;
Neither ((unsigned int*)0x1234) nor ((data_t*)0x1234 ) is an lvalue, and you can't assign to either of them.
More generally, the prefix of -> doesn't have to be an lvalue. But prefix->member is always an lvalue, whether prefix is or not. Similarly, *p is an value whether p is an lvalue or not.

Definition ambiguity about whether dereference operator yields object vs value in an expression

Recently, I have started learning C++ and following the first book from this StackOverflow resource. My question is pretty straight forward. The way * (in an expression) is defined as in the three current resources I am referring to learn C++ are as follows.
* in this context is the dereference operator.
C++ Primer says * yields an object.
Wikipedia says * returns an l-value equivalent to the value at the pointer address.
A Tour of C++ says * returns the content of object.
Where, 2 and 3 make complete sense to me but not 1. Can someone dumb it down for me?
In my understanding an object is simply a memory location which holds some value.
However, Can object be thrown as keyword to reflect the value in context to dereferencing a pointer as in 1?
Examples are greatly appreciated.
Edit: Solved, Answers by user eerorika and bolov are perfect and elegant. However eerorika answered first so I will accept it as the answer but all the answers are great as well in their own way. Thanks SO community.
In C++ terminology, an "object" is a value which occupies a region of storage. Here are examples of objects:
int x = 42; // x is an object of type int
int y; // y is an object of type int
char *p = new char; // p points to an object of type char
size_t s = std::string().size(); // s is an object of type size_t; there is also an unnamed temporary object of type std::string involved in the expression
Dereferencing a pointer yields an lvalue which refers to the object to which the point points. So let's take another example:
int i = 42;
int *p = &i;
*p = 314;
The expression *p is an lvalue which references the object i. After the above code, i will have the value 314.
So I'd say that both statements 1 (C++ Primer) and 2 (Wikipedia) are correct, saying the same thing in different words. While 3 is probably correct as well, I consider it somewhat misleading, as you don't normally say an object to have "contents." You could say an object has a value, or that it is a value.
Regarding this statement which was originally present in the question:
In my understanding an object is simply a memory location which holds some value which can be useful or null.
The first part is effectively correct, an object is a memory location which holds some value. However, the second part seems somewhat confused: in C++, you normally speak about null only in the context of pointers. A pointer is a type of object which holds an address of another object (or function), or a special value, the null pointer value, which means "not pointing to anything." Most objects (ints, doubles, chars, arrays, instances of class types) cannot be said to be "null".
They are all wrong. But that's ok, because they are all correct :)
The only real authority here is the standard. And the standard is very precise:
§5.3.1 Unary operators [expr.unary.op]
1 The unary * operator performs indirection: the expression to which
it is applied shall be a pointer to an object type, or a pointer to a
function type and the result is an lvalue referring to the object or
function to which the expression points. If the type of the expression
is “pointer to T,” the type of the result is “T.”
[n4296]
So there you go, the only valid definition is:
the result is an lvalue referring to the object or function to which
the expression points
Anything else is plain wrong.
Except... The standard is useful for compiler implementers and settling down debates of code conformity with the standard. When teaching or just when talking about C++ we generally aren't as rigorous as the standard. We do some simplications for the sake of ... well simplicity (or sometimes just out of ignorance). So we sometimes say "object" when the correct standard terminology would be "lvalue expression referring to an object" and so on.
I would say all the definitions are correct, they just use different terms or take different shortcuts for the sake of easy understanding.
Colloquially, and in this context in particular, value and object are often used as synonyms.
All three descriptions are correct. Wikipedia one is the most precise description out of those three.
Wikipedia says of values:
An lvalue refers to an object that persists beyond a single expression. An rvalue is a temporary value that does not persist beyond the expression that uses it.
C++ standard (draft) says of objects:
The constructs in a C ++ program create, destroy, refer to, access, and manipulate objects. An object is
created by a definition (6.1), by a new-expression (8.3.4), when implicitly changing the active member of a
union (12.3), or when a temporary object is created (7.4, 15.2). An object occupies a region of storage in its
period of construction (15.7), throughout its lifetime (6.8), and in its period of destruction (15.7). ...
And about memory location:
A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having
nonzero width. ...
In my understanding an object is simply a memory location which holds some value which can be useful or null
An object cannot be null in general - unless that object is a pointer, or some other specific type which has a special value described by the word null.
From the docs
The dereference or indirection expression has the form
* pointer-expression
If pointer-expression is a pointer to function, the result of the dereference operator is a function designator for that function.
If pointer-expression is a pointer to object, the result is an lvalue expression that designates the pointed-to object.
So as an example
int x = 5; // x is an lvalue
int* p = &x; // p is a pointer that points to x
*p = 7; // dereferencing p returns the lvalue x, which you can then operate on

Accessing field of NULL pointer to a struct works if address-of is used [duplicate]

Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)
From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.
Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.
Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.
First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.
Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.
No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).

What wording in the C++ standard allows static_cast<non-void-type*>(malloc(N)); to work?

As far as I understand the wording in 5.2.9 Static cast, the only time the result of a void*-to-object-pointer conversion is allowed is when the void* was a result of the inverse conversion in the first place.
Throughout the standard there is a bunch of references to the representation of a pointer, and the representation of a void pointer being the same as that of a char pointer, and so on, but it never seems to explicitly say that casting an arbitrary void pointer yields a pointer to the same location in memory, with a different type, much like type-punning is undefined where not punning back to an object's actual type.
So while malloc clearly returns the address of suitable memory and so on, there does not seem to be any way to actually make use of it, portably, as far as I have seen.
C++0x standard draft has in 5.2.9/13:
An rvalue of type “pointer to cv1
void” can be converted to an rvalue of
type “pointer to cv2 T,” where T is an
object type and cv2 is the same
cv-qualification as, or greater
cv-qualification than, cv1. The null
pointer value is converted to the null
pointer value of the destination type.
A value of type pointer to object
converted to “pointer to cv void” and
back, possibly with different
cv-qualification, shall have its
original value.
But also note that the cast doesn't necessarily result in a valid object:
std::string* p = static_cast<std::string*>(malloc(sizeof(*p)));
//*p not a valid object
C++03, §20.4.6p2
The contents are the same as the Standard C library header <stdlib.h>, with the following changes: [list of changes that don't apply here]
C99, §7.20.3.3p2-3
(Though C++03 is based on C89, I only have C99 to quote. However, I believe this section is semantically unchanged. §7.20.3p1 may also be useful.)
The malloc function allocates space for an object whose size is specified by size and
whose value is indeterminate.
The malloc function returns either a null pointer or a pointer to the allocated space.
From these two quotes, malloc allocates an uninitialized object and returns a pointer to it, or returns a null pointer. A pointer to an object which you have as a void pointer can be converted to a pointer to that object (first sentence of C++03 §5.2.9p13, mentioned in the previous answer).
This should be less "handwaving", which you complained of, but someone might argue I'm "interpreting" C's definition of malloc as I wish, by, for example, noticing C says "to the allocated space" rather than "to the allocated object". To those people: first realize that "space" and "object" are synonyms in C, and second please file a defect report with the standard committees, because not even I am pedantic enough to continue. :)
I'll give you the benefit of the doubt and believe you got tripped up in the cross-references, cross-interpretation, and sometimes-confused integration between the standards, rather than "space" vs "object".
Throughout the standard there is a bunch of references to the representation of a pointer, and the representation of a void pointer being the same as that of a char pointer,
Yes, indeed.
So while malloc clearly returns the address of suitable memory and so on, there does not seem to be any way to actually make use of it, portably, as far as I have seen.
Of course there is:
void *vp = malloc (1);
char *cp;
memcpy (&cp, &vb, sizeof cp);
*cp = ' ';
There is one tiny problem : it does not work for any other type. :(