C/C++ nullptr dereference [duplicate] - c++

This question already has answers here:
Is there a platform or situation where dereferencing (but not using) a null pointer to make a null reference will behave badly?
(6 answers)
Closed 6 years ago.
Since de-referencing nullptr (NULL) is an undefined behavior both in C and C++, I am wondering if expression &(*ptr) is a valid one if ptr is nullptr (NULL).
If it is also an undefined behavior, how does OFFSETOF macro in the linked answer work?
I always thought that ptr->field is a shorthand for (*ptr).field
I think the answer to my question is similar in C and C++.

TL;DR &(*(char*)0) is well defined.
The C++ standard doesn't say that indirection of null pointer by itself has UB. Current standard draft, [expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”. [snip]
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. [snip]
There is no UB unless the lvalue of the indirection expression is converted to an rvalue.
The C standard is much more explicit. C11 standard draft §6.5.3.2
The unary & operator yields the address of its operand. If the operand has type "type", the result has type "pointer to type". If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the
& operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

If it is also an undefined behavior, how does offsetof work?
Prefer using the standard offsetof macro. Home-grown versions result in compiler warnings. Moreover:
offsetof is required to work as specified above, even if unary operator& is overloaded for any of the types involved. This cannot be implemented in standard C++ and requires compiler support.
offsetof is a built-in function in gcc.

Related

Is member access on a null pointer defined in C++?

Is address computation on a null pointer defined behavior in C++? Here's a simple example program.
struct A { int x; };
int main() {
A* p = nullptr;
&(p->x); // is this undefined behavior?
return 0;
}
Thanks.
EDIT Subscripting is covered in this other question.
&(p->x); // is this undefined behavior?
Standard is a bit vague regarding this:
[expr.ref] ... The expression E1->E2 is converted to the equivalent form (*(E1)).E2;
[expr.unary.op] The unary * operator ... the result is an lvalue referring to the object ... to which the expression points.
There is no explicit mention of UB in the section. The quoted rule does appear to conflict with the fact that the null pointer doesn't point to any object. This could be interpreted that yes, behaviour is undefined.
[expr.unary.op] The result of the unary & operator is a pointer to its operand. ... if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object ([intro.memory]).
Again, no designated object exists. Note that at no point is the operand lvalue converted to an rvalue, which would definitely have been UB.
Back in 2000 there was CWG issue to clarify whether indirection through null is undefined. The proposed resolution (2004), that would clarify that indirection through null is not UB, appears to not have been added to the standard so far.
However whether it is or isn't UB doesn't matter much since you don't need to do this. At the very least, the resulting pointer will be invalid and thus useless.
If you were planning to convert the pointer to an integer to get the offset of the member, there is no need to do this because you can instead us the offsetof macro from the standard library, which doesn't have UB.
&(p[1]); // undefined?
Here, behaviour is quite clearly undefined:
[expr.sub] ... The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
[expr.add] When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0 (does not apply)
Otherwise, if P points to an array element (does not apply)
Otherwise, the behavior is undefined.
&(p[0]); // undefined?
As per previous rules, the first option applies:
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
And now we are back to the question of whether indirection through this null is UB. See the beginning of the answer.
Still, doesn't really matter. There is no need to write this, since this is simply unnecessarily complicated way to write sizeof(int) * i (with i being 1 and 0 respectively).

Why are multiple pre-increments allowed in C++ but not in C? [duplicate]

This question already has answers here:
Why are multiple increments/decrements valid in C++ but not in C?
(4 answers)
Closed 5 years ago.
Why is
int main()
{
int i = 0;
++++i;
}
valid C++ but not valid C?
C and C++ say different things about the result of prefix ++. In C++:
[expr.pre.incr]
The operand of prefix ++ is modified by adding 1. The operand shall be
a modifiable lvalue. The type of the operand shall be an arithmetic
type other than cv bool, or a pointer to a completely-defined object
type. The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field. The expression ++x is
equivalent to x+=1.
So ++ can be applied on the result again, because the result is basically just the object being incremented and is an lvalue. In C however:
6.5.3 Unary operators
The operand of the prefix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
The value of the operand of the prefix ++ operator is incremented. The
result is the new value of the operand after incrementation.
The result is not an lvalue; it's just the pure value of the incrementation. So you can't apply any operator that requires an lvalue on it, including ++.
If you are ever told the C++ and C are superset or subset of each other, know that it is not the case. There are many differences that make that assertion false.
In C, it's always been that way. Possibly because pre-incremented ++ can be optimised to a single machine code instruction on many CPUs, including ones from the 1970s which was when the ++ concept developed.
In C++ though there's the symmetry with operator overloading to consider. To match C, the canonical pre-increment ++ would need to return const &, unless you had different behaviour for user-defined and built-in types (which would be a smell). Restricting the return to const & is a contrivance. So the return of ++ gets relaxed from the C rules, at the expense of increased compiler complexity in order to exploit any CPU optimisations for built-in types.
I assume you understand why it's fine in C++ so I'm not going to elaborate on that.
For whatever it's worth, here's my test result:
t.c:6:2: error: lvalue required as increment operand
++ ++c;
^
Regarding CppReference:
Non-lvalue object expressions
Colloquially known as rvalues, non-lvalue object expressions are the expressions of object types that do not designate objects, but rather values that have no object identity or storage location. The address of a non-lvalue object expression cannot be taken.
The following expressions are non-lvalue object expressions:
all operators not specified to return lvalues, including
increment and decrement operators (note: pre- forms are lvalues in C++)
And Section 6.5.3.1 from n1570:
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation.
So in C, the result of prefix increment and prefix decrement operators are not required to be lvalue, thus not incrementable again. In fact, such word can be understood as "required to be rvalue".
The other answers explain the way that the standards diverge in what they require. This answer provides a motivating example in the area of difference.
In C++, you can have a function like int& foo(int&);, which has no analog in C. It is useful (and not onerous) for C++ to have the option of foo(foo(x));.
Imagine for a moment that operations on basic types were defined somewhere, e.g. int& operator++(int&);. ++++x itself is not a motivating example, but it fits the pattern of foo above.

C++ member access/indirection operator equivalence

I was looking at the C++ standard regarding member reference operators (the unary * dereferencing operator, the -> member accessor operator) as well as many other related questions:
C++ - Difference between (*). and ->?
ptr->hello(); /* VERSUS */ (*ptr).hello();
C++ pointers difference between * and ->
I saw that most answers stated that p->m is syntactic sugar for (*p).m as defined by the C++ Standard (5.2.5, paragraph 2):
The expression E1->E2 is converted to the equivalent form (*(E1)).E2
Many comments also noted that because operator* and operator-> are overloadable in classes, they should be overloaded uniformly to ensure consistent behavior.
These statements seem to contradict each other: if (as per the standard) E1->E2 is converted to the equivalent form (*(E1)).E2, then what would be the purpose of overloading operator-> (as is permitted by the standard)?
Simpler stated, are these two parts of the standard in conflict, or am I misunderstanding the Standard?
Does the E1->E2 equivalence transformation to (*(E1)).E2 apply to all complete types or only to built in ones?
The conversion from E1 -> E2 to (*(E1)).E2 only applies to raw pointer types. For class types, E1 -> E2 evaluates to (E1).operator->().E2, which potentially might recursively expand out even more copies of operator-> if the return type of operator-> is not itself a pointer type. You can see this by creating a type that supports operator* but not operator-> and trying to use the arrow operator on it; you'll get an error that operator-> is undefined.
As a follow-up, it's common to implement operator -> in terms of operator * in a way that makes the semantics of -> match the semantics for pointers. You often see things like this:
PointerType ClassType::operator-> () const {
return &**this;
}
This expression is interpreted as
&(*(*this)),
meaning "take this object (*this), dereference it (*(*this)), and get the address of what you find (&(*(*this)).)." Now, if you use the rule that E1 -> E2 should be equivalent to (*(E1)).E2, you can see that you end up getting something equivalent.

L-Value, Pointer arithmetic [duplicate]

This question already has answers here:
Why ++i++ gives "L-value required error" in C? [duplicate]
(5 answers)
Closed 9 years ago.
I am looking for an explanation of how lines L1 and L2 in the code snippet below differ w.r.t l-values, i.e, Why am I getting the: C2105 error in L1, but not in L2?
*s = 'a';
printf("%c\n", *s );
//printf("%c\n", ++(*s)++ ); //L1 //error C2105: '++' needs l-value
printf("%c\n", (++(*s))++); //L2
printf("%c\n", (*s) );
Note: I got the above result when the code was compiled as a .cpp file. Now, on compilation as .c file, I get the same error C2105 on both lines L1 and L2. Why does L2 compile in C++, and not in C is another mystery :(.
If its of any help, I'm using Visual C++ Express Edition.
Compiler see ++(*s)++ as ++((*s)++), as post-increment has higher precedence than pre-increment. After post-incrementation, (*s)++ becomes an r-value and it can't be further pre-incremented (here).
And yes it is not a case of UB (at least in C).
And also read this answer.
For L2 in C++ not giving error because
C++11: 5.3.2 Increment and decrement says:
The operand of prefix ++ is modified by adding 1, or set to true if it is bool (this use is deprecated). The
operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a completely-defined object type. The result is the updated operand; it is an lvalue, and it is a bit-field if the operand is a bit-field. If x is not of type bool, the expression ++x is equivalent to x+=1.
C++11:5.2.6 Increment and decrement says:
The value of a postfix ++ expression is the value of its operand. [ Note: the value obtained is a copy of the original value —end note ] The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a complete object type. The value of the operand object is modified by adding 1 to it, unless the object is of type bool, in which case it is set to true. The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix
++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. —end note ] The result is a
prvalue. The type of the result is the cv-unqualified version of the type of the operand.
and also on MSDN site it is stated that:
The operands to postfix increment and postfix decrement operators must be modifiable (not const) l-values of arithmetic or pointer type. The type of the result is the same as that of the postfix-expression, but it is no longer an l-value.
For completeness and quotes from C++ documentation:
Looking at L2, the prefix-increment/decrement returns an l-value. Hence, no error when performing the post-increment. From the C++ documentation for prefix-increment/decrement:
The result is an l-value of the same type as the operand.
Looking at L1, it becomes:
++( ( *s )++ )
... after operand precedence. The post-increment operator by definition evaluates the expression (returning an r-value) and then mutates it. From C++ documentation for post-increment/decrement:
The type of the result is the same as that of the postfix-expression, but it is no longer an l-value.
... and you cannot prefix-increment/decrement an r-value, hence error.
References:
Postfix Operand Doc.
Prefix Operand Doc.

Why pre-increment operator gives rvalue in C?

In C++, pre-increment operator gives lvalue because incremented object itself is returned, not a copy.
But in C, it gives rvalue. Why?
C doesn't have references. In C++ ++i returns a reference to i (lvalue) whereas in C it returns a copy(incremented).
C99 6.5.3.1/2
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++Eis equivalent to (E+=1).
‘‘value of an expression’’ <=> rvalue
However for historical reasons I think "references not being part of C" could be a possible reason.
C99 says in the footnote (of section $6.3.2.1),
The name ‘‘lvalue’’ comes originally
from the assignment expression E1 =
E2, in which the left operand E1 is
required to be a (modifiable) lvalue.
It is perhaps better considered as
representing an object ‘‘locator
value’’. What is sometimes called
‘‘rvalue’’ is in this International
Standard described as the ‘‘value of
an expression’’.
Hope that explains why ++i in C, returns rvalue.
As for C++, I would say it depends on the object being incremented. If the object's type is some user-defined type, then it may always return lvalue. That means, you can always write i++++++++ or ++++++i if type of i is Index as defined here:
Undefined behavior and sequence points reloaded
Off the top of my head, I can't imagine any useful statements that could result from using a pre-incremented variable as an lvalue. In C++, due to the existence of operator overloading, I can. Do you have a specific example of something that you're prevented from doing in C, due to this restriction?