Is &*NULL well-defined in C++? - c++

In what version(s) of the C++ standards (if any) is the following well-defined?
void foo(void) {
char *nullPtr = NULL;
&*nullPtr;
}
Note that I am specifically asking about &*nullPtr here. I am aware that simply *nullPtr is undefined - but this is a separate question and hence the currently-linked "duplicate" is not a duplicate.
Note that I am not assigning the result to anything - the second line is a simple statement.
This should be a question with an obvious answer, but (as seemingly happens way too often on such questions) I have heard just as many people say the answer is "obviously undefined" as "obviously defined".
On a rather related note, what about the following? Should foo produce a read of c?
extern volatile char c;
void bar(void) {
volatile char *nonnullptr = &c;
&*nonnullptr;
}
(C version of the same question: Is &*NULL well-defined in C?)

This is four questions in one.
&*nullPtr is well-defined in C since C99, which says of the unary & operator:
If the operand is the result of a unary * operator, neither that
operator nor the & operator is evaluated and the result is as if both
were omitted, [...]
See WG14 N721 and DR076.
&*nullPtr is formally undefined in all revisions of C++ (by omission: unary & is specified to produce a pointer to "the designated object", and unary * is specified to produce "an lvalue referring to the object [...] to which the expression points"; a null pointer value points to no object), although the direction of core issue 232 is to make this well-defined.
&*nonnullptr produces no volatile read of *nonnullptr. Unary & expects an lvalue operand; no lvalue conversion (for C) or lvalue-to-rvalue conversion (for C++) is performed for *nonnullptr.

Related

Is it undefined behavior to dereference nullptr like this? [duplicate]

First of all, I've seen this question about C99 and the accepted answer references operand is not evaluated wording in the C99 Standard draft. I'm not sure this answer applies to C++03. There's also this question about C++ that has an accepted answer citing similar wording and also In some contexts, unevaluated operands appear. An unevaluated operand is not evaluated. wording.
I have this code:
int* ptr = 0;
void* buffer = malloc( 10 * sizeof( *ptr ) );
The question is - is there a null pointer dereference (and so UB) inside sizeof()?
C++03 5.3.3/1 says The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id.
The linked to answers cite this or similar wording and make use of "is not evaluated" part to deduce there's no UB.
However I cannot find where exactly the Standard links evaluation to having or not having UB in this case.
Does "not evaluating" the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?
I believe this is currently underspecified in the standard, like many issues such as What is the value category of the operands of C++ operators when unspecified?. I don't think it was intentional, like hvd points outs it is probably obvious to the committee.
In this specific case I think we have the evidence to show what the intention was. From GB 91 comment from the Rapperswil meeting which says:
It is mildly distasteful to dereference a null pointer as part of our specification, as we are playing on the edges of undefined behaviour. With the addition of the declval function template, already used in these same expressions, this is no longer necessary.
and suggested an alternate expression, it refers to this expression which is no longer in the standard but can be found in N3090:
noexcept(*(U*)0 = declval<U>())
The suggestion was rejected since this does not invoke undefined behavior since it is unevaluated:
There is no undefined behavior because the expression is an unevaluated operand. It's not at all clear that the proposed change would be clearer.
This rationale applies to sizeof as well since it's operands are unevaluated.
I say underspecified but I wonder if this is covered by section 4.1 [conv.lval] which says:
The value contained in the object indicated by the lvalue is the rvalue result. When an lvalue-to-rvalue conversion occurs
within the operand of sizeof (5.3.3) the value contained in the referenced object is not accessed, since that operator
does not evaluate its operand.
It says the value contained is not accessed, which if we follow the logic of issue 232 means there is no undefined behavior:
In other words, it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior
This is somewhat speculative since the issue is not settled yet.
Since you explicitly asked for standard references - [expr.sizeof]/1:
The operand is either an expression, which is an unevaluated operand
(Clause 5), or a parenthesized type-id.
[expr]/8:
In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7,
7.1.6.2). An unevaluated operand is not evaluated.
Because the expression (i.e. the dereferenciation) is never evaluated, this expression is not subject to some constraints that it would normally be violating. Solely the type is inspected. In fact, the standard uses null references itself in an example in [dcl.fct]/12:
A trailing-return-type is most useful for a type that would be more
complicated to specify before the
declarator-id:
template <class T, class U> auto add(T t, U u) -> decltype(t + u);
rather than
template <class T, class U> decltype((*(T*)0) + (*(U*)0)) add(T t, U u);
— end note ]
The specification only says that dereferencing some pointer that is NULL is UB. Since sizeof() is not a real function, and it doesn't actually use the arguments for anything other than getting the type, it never references the pointer. That's WHY it works. Someone else can get all the points for looking up the spec wording that states that "the argument to sizeof doesn't get referenced".
Note that it's also entirely legal to do int arr[2]; size_t s = sizeof(arr[-111100000]); too - it doesn't matter what the index is, because sizeof never actually "does anything" to the argument passed.
Another example to show how it's "not doing anything" would be something like this:
int func()
{
int *ptr = reinterpret_cast<int*>(32);
*ptr = 7;
return 42;
}
size_t size = sizeof(func());
Again, this wouldn't crash, because func() is just resolved by the compiler to the type that it produces.
Equally, if sizeof actually "does something" with the argument, what would happen when you do this:
char *buffer = new sizeof(char[10000000000]);
Would it create a 10000000000 stack allocation, then give the size back after it crashed the code because there isn't enough megabytes of stack? [In some systems, stack size is counted in bytes, not megabytes]. And whilst nobody writes code like that, you could easily come up with something similar using typedef of either buffer_type as an array of char, or some kind of struct with large content.

Why are multiple pre-increments allowed in C++ but not in C? [duplicate]

This question already has answers here:
Why are multiple increments/decrements valid in C++ but not in C?
(4 answers)
Closed 5 years ago.
Why is
int main()
{
int i = 0;
++++i;
}
valid C++ but not valid C?
C and C++ say different things about the result of prefix ++. In C++:
[expr.pre.incr]
The operand of prefix ++ is modified by adding 1. The operand shall be
a modifiable lvalue. The type of the operand shall be an arithmetic
type other than cv bool, or a pointer to a completely-defined object
type. The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field. The expression ++x is
equivalent to x+=1.
So ++ can be applied on the result again, because the result is basically just the object being incremented and is an lvalue. In C however:
6.5.3 Unary operators
The operand of the prefix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
The value of the operand of the prefix ++ operator is incremented. The
result is the new value of the operand after incrementation.
The result is not an lvalue; it's just the pure value of the incrementation. So you can't apply any operator that requires an lvalue on it, including ++.
If you are ever told the C++ and C are superset or subset of each other, know that it is not the case. There are many differences that make that assertion false.
In C, it's always been that way. Possibly because pre-incremented ++ can be optimised to a single machine code instruction on many CPUs, including ones from the 1970s which was when the ++ concept developed.
In C++ though there's the symmetry with operator overloading to consider. To match C, the canonical pre-increment ++ would need to return const &, unless you had different behaviour for user-defined and built-in types (which would be a smell). Restricting the return to const & is a contrivance. So the return of ++ gets relaxed from the C rules, at the expense of increased compiler complexity in order to exploit any CPU optimisations for built-in types.
I assume you understand why it's fine in C++ so I'm not going to elaborate on that.
For whatever it's worth, here's my test result:
t.c:6:2: error: lvalue required as increment operand
++ ++c;
^
Regarding CppReference:
Non-lvalue object expressions
Colloquially known as rvalues, non-lvalue object expressions are the expressions of object types that do not designate objects, but rather values that have no object identity or storage location. The address of a non-lvalue object expression cannot be taken.
The following expressions are non-lvalue object expressions:
all operators not specified to return lvalues, including
increment and decrement operators (note: pre- forms are lvalues in C++)
And Section 6.5.3.1 from n1570:
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation.
So in C, the result of prefix increment and prefix decrement operators are not required to be lvalue, thus not incrementable again. In fact, such word can be understood as "required to be rvalue".
The other answers explain the way that the standards diverge in what they require. This answer provides a motivating example in the area of difference.
In C++, you can have a function like int& foo(int&);, which has no analog in C. It is useful (and not onerous) for C++ to have the option of foo(foo(x));.
Imagine for a moment that operations on basic types were defined somewhere, e.g. int& operator++(int&);. ++++x itself is not a motivating example, but it fits the pattern of foo above.

Semantics of unary & on numeric literal

What is the unary-& doing here?
int * a = 1990;
int result = &5[a];
If you were to print result you would get the value 2010.
You have to compile it with -fpermissive or it will stop due to errors.
In C, x [y] and y [x] are identical. So &5[a] is the same as &a[5].
&5[a] is the same as &a[5] and the same as a + 5. In your case it's undefined behavior because a points to nowhere.
C11 standard chapter 6.5.6 Additive operators/8 (the same in C++):
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
"...unary & on numeric literal"?
Postfix operators in C always have higher priority than prefix ones. In case of &5[a], the [] has higher priority than the &. Which means that in &5[a] the unary & is not applied to "numeric literal" as you seem to incorrectly believe. It is applied to the entire 5[a] subexpression. I.e. &5[a] is equivalent to &(5[a]).
As for what 5[a] means - this is a beaten-to-death FAQ. Look it up.
And no, you don't have "to compile it with -fpermissive" (my compiler tells me it doesn't even know what -fpermissive is). You have to figure out that this
int * a = 1990;
is not legal code in either C or C++. If anything, it requires an explicit cast
int * a = (int *) 1990;
not some obscure switch of some specific compiler you happened to be using at the moment. The same applies to another illegal initialization in int result = &5[a].
Finally, even if we overlook the illegal code and the undefined behavior triggered by that 5[a], the behavior of this code will still be highly implementation-dependent. I.e. the answer is no, in general case you will not get 2010 in result.
You cannot apply the unary & operator to an integer literal, because a literal is not an lvalue.
Due to operator precedence, your code doesn't do that. Since the indexing operator [] binds more tightly than unary &, &5[a] is equivalent to &(5[a]).
Here's a program similar to yours, except that it's valid code, not requiring -fpermissive to compile:
#include <stdio.h>
int main(void) {
int arr[6];
int *ptr1 = arr;
int *ptr2 = &5[ptr1];
printf("%p %p\n", ptr1, ptr2);
}
As explained in this question and my answer, the indexing operator is commutative (because it's defined in terms of addition, and addition is commutative), so 5[a] is equivalent to a[5]. So the expression &5[ptr1] computes the address of element 5 of arr.
In your program:
int * a = 1990;
int result = &5[a];
the initialization of a is invalid because a is of type int* and 1990 is of type int, and there is no implicit conversion from int to int*. Likewise, the initialization of result is invalid because &5[a] is of type int*. Apparently -fpermissive causes the compiler to violate the rules of the language and permit these invalid implicit conversions.
At least in the version of gcc I'm using, the -fpermissive option is valid only for C++ and Objective-C, not for C. In C, gcc permits such implicit conversions (with a warning) anyway. I strongly recommend not using this option. (Your question is tagged both C and C++. Keep in mind that C and C++ are two distinct, though closely related, languages. They happen to behave similarly in this case, but it's usually best to pick one language or the other.)

Why are multiple increments/decrements valid in C++ but not in C?

test.(c/cpp)
#include <stdio.h>
int main(int argc, char** argv)
{
int a = 0, b = 0;
printf("a = %d, b = %d\n", a, b);
b = (++a)--;
printf("a = %d, b = %d\n", a, b);
return 0;
}
If I save the above as a .cpp file, it compiles and outputs this upon execution:
a = 0, b = 0
a = 0, b = 1
However, if I save it as a .c file, I get the following error:
test.c:7:12: error: lvalue required as decrement operator.
Shouldn't the (++a) operation be resolved before the (newValue)-- operation? Does anyone have any insight on this?
In C the result of the prefix and postfix increment/decrement operators is not an lvalue.
In C++ the result of the postfix increment/decrement operator is also not an lvalue but the result of the prefix increment/decrement operator is an lvalue.
Now doing something like (++a)-- in C++ is undefined behavior because you are modifying an object value twice between two sequence points.
EDIT: following up on #bames53 comment. It is undefined behavior in C++98/C++03 but the changes in C++11 on the idea of sequence points now makes this expression defined.
In C and C++, there are lvalue expressions which may be used on the left-hand side of the = operator and rvalue expressions which may not. C++ allows more things to be lvalues because it supports reference semantics.
++ a = 3; /* makes sense in C++ but not in C. */
The increment and decrement operators are similar to assignment, since they modify their argument.
In C++03, (++a)-- would cause undefined behavior because two operations which are not sequenced with respect to each other are modifying the same variable. (Even though one is "pre" and one is "post", they are unsequenced because there is no ,, &&, ?, or such.)
In C++11, the expression now does what you would expect. But C11 does not change any such rules, it's a syntax error.
For anybody who might want the precise details of the differences as they're stated in the standards, C99, §6.5.3/2 says:
The value of the operand of the prefix ++ operator is incremented. The result is the new
value of the operand after incrementation.
By contrast, C++11, §5.3.2/1 says:
The result is the updated operand; it is an lvalue, and it is a bit-field if
the operand is a bit-field.
[emphasis added, in both cases]
Also note that although (++a)-- gives undefined behavior (at least in C++03) when a is an int, if a is some user-defined type, so you're using your own overloads of ++ and --, the behavior will be defined -- in such a case, you're getting the equivalent of:
a.operator++().operator--(0);
Since each operator results in a function call (which can't overlap) you actually do have sequence points to force defined behavior (note that I'm not recommending its use, only noting that the behavior is actually defined in this case).
§5.2.7 Increment and decrement:
The value of a postfix ++ expression is the value of its operand. [ ... ]  The operand shall be a modifiable lvalue.
The error you get in your C compilation helps to suggest that this is only a feature present in C++.

Why pre-increment operator gives rvalue in C?

In C++, pre-increment operator gives lvalue because incremented object itself is returned, not a copy.
But in C, it gives rvalue. Why?
C doesn't have references. In C++ ++i returns a reference to i (lvalue) whereas in C it returns a copy(incremented).
C99 6.5.3.1/2
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++Eis equivalent to (E+=1).
‘‘value of an expression’’ <=> rvalue
However for historical reasons I think "references not being part of C" could be a possible reason.
C99 says in the footnote (of section $6.3.2.1),
The name ‘‘lvalue’’ comes originally
from the assignment expression E1 =
E2, in which the left operand E1 is
required to be a (modifiable) lvalue.
It is perhaps better considered as
representing an object ‘‘locator
value’’. What is sometimes called
‘‘rvalue’’ is in this International
Standard described as the ‘‘value of
an expression’’.
Hope that explains why ++i in C, returns rvalue.
As for C++, I would say it depends on the object being incremented. If the object's type is some user-defined type, then it may always return lvalue. That means, you can always write i++++++++ or ++++++i if type of i is Index as defined here:
Undefined behavior and sequence points reloaded
Off the top of my head, I can't imagine any useful statements that could result from using a pre-incremented variable as an lvalue. In C++, due to the existence of operator overloading, I can. Do you have a specific example of something that you're prevented from doing in C, due to this restriction?