I'm using N3936 as a reference here (please correct this question if any of the C++14 text differs).
Under 3.10 Lvalues and rvalues we have:
Every expression belongs to exactly one of the fundamental classifications in this taxonomy: lvalue, xvalue, or prvalue.
However the definition of lvalue reads:
An lvalue [...] designates a function or an object.
In 4.1 Lvalue-to-rvalue conversion the text appears:
[...] In all other cases, the result of the conversion is determined according to the following rules:
[...]
Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.
My question is: what happens in code where the lvalue does not designate an object? There are two canonical examples:
Example 1:
int *p = nullptr;
*p;
int &q = *p;
int a = *p;
Example 2:
int arr[4];
int *p = arr + 4;
*p;
int &q = *p;
std::sort(arr, &q);
Which lines (if any) are ill-formed and/or cause undefined behaviour?
Referring to Example 1: is *p an lvalue? According to my first quote it must be. However, my second quote excludes it since *p does not designate an object. (It's certainly not an xvalue or a prvalue either).
But if you interpret my second quote to mean that *p is actually an lvalue, then it is not covered at all by the lvalue-to-rvalue conversion rules. You may take the catch-all rule that "anything not defined by the Standard is undefined behaviour" but then you must permit null references to exist, so long as there is no lvalue-to-rvalue conversion performed.
History: This issue was raised in DR 232 . In C++11 the resolution from DR232 did in fact appear. Quoting from N3337 Lvalue-to-rvalue conversion:
If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
which still appears to permit null references to exist - it only clears up the issue of performing lvalue-to-rvalue conversion on one. Also discussed on this SO thread
The resolution from DR232 no longer appears in N3797 or N3936 though.
It isn't possible to create a reference to null or a reference to the off-the-end element of an array, because section 8.3.2 says (reading from draft n3936) that
A reference shall be initialized to refer to a valid object or function.
However, it is not clear that forming an expression with a value category of lvalue constitutes "initialization of a reference". Quite the contrary, in fact, temporary objects are objects, and references are not objects, so it cannot be said that *(a+n) initializes a temporary object of reference type.
I think the answer to this although probably not the answer you really want, is that this is under-specified or ill-specified and therefore we can not really say whether the examples you have provided are ill-formed or invoke undefined behavior according the current draft standard.
We can see this by looking DR 232 and DR 453.
DR 232 tells us that the standard conflicts on whether derferencing a null pointer is undefined behavior:
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
and introduces the concept of an empty lvalue which is the result of indiretion on a null pointer or one past the end of an array:
if any. If the pointer is a null pointer value (4.10 [conv.ptr]) or
points one past the last element of an array object (5.7 [expr.add]),
the result is an empty lvalue and does not refer to any object or
function.
and proposes that the lvaue-to-rvalue conversion of such is undefined behavior.
and DR 453 tell us that we don't know what a valid object is:
What is a "valid" object? In particular the expression "valid object"
seems to exclude uninitialized objects, but the response to Core Issue
363 clearly says that's not the intent.
and suggests that binding a reference to an empty value is undefined behavior.
If an lvalue to which a reference is directly bound designates neither
an existing object or function of an appropriate type (8.5.3
[dcl.init.ref]), nor a region of memory of suitable size and alignment
to contain an object of the reference's type (1.8 [intro.object], 3.8
[basic.life], 3.9 [basic.types]), the behavior is undefined.
and includes the following examples in the proposal:
int& f(int&);
int& g();
extern int& ir3;
int* ip = 0;
int& ir1 = *ip; // undefined behavior: null pointer
int& ir2 = f(ir3); // undefined behavior: ir3 not yet initialized
int& ir3 = g();
int& ir4 = f(ir4); // ill-formed: ir4 used in its own initializer
So if we want to restrict ourselves to dealing only with the intent then I feel that DR 232 and DR 453 provide the information we need to say that the intention is that lvalue-to-rvalue conversion of a null pointer is undefined behavior and a reference to a null pointer or an indeterminate value is also undefined behavior.
Now although it has taken a while for both of these report resolutions to be sorted out, they are both active with relatively recent updates and apparently the committee so far does not disagree with the main premise that the defects reported are actual defects. So it follows without knowing these two items it would imply it is not possible to provide an answer to your question using the current draft standards.
Related
Consider the following scenario:
std::array<int, 8> a;
auto p = reinterpret_cast<int(*)[8]>(a.data());
(*p)[0] = 42;
Is this undefined behavior? I think it is.
a.data() returns a int*, which is not the same as int(*)[8]
The type aliasing rules on cppreference seem to suggest that the reinterpret_cast is not valid
As a programmer, I know that the memory location pointed by a.data() is an array of 8 int objects
Is there any rule I am missing that makes this reinterpret_cast valid?
An array object and its first element are not pointer-interconvertible*, so the result of the reinterpret_cast is a pointer of type "pointer to array of 8 int" whose value is "pointer to a[0]"1.In other words, despite the type, it does not actually point to any array object.
The code then applies the array-to-pointer conversion to the lvalue that resulted from dereferencing such a pointer (as a part of the indexing expression (*p)[0])2. That conversion's behavior is only specified when the lvalue actually refers to an array object3. Since the lvalue in this case does not, the behavior is undefined by omission4.
*If the question is "why is an array object and its first element not pointer-interconvertible?", it has already been asked: Pointer interconvertibility vs having the same address.
1See [expr.reinterpret.cast]/7, [conv.ptr]/2, [expr.static.cast]/13 and [basic.compound]/4.
2See [basic.lval]/6, [expr.sub] and [expr.add].
3[conv.array]: "The result is a pointer to the first element of the array."
4[defns.undefined]: undefined behavior is "behavior for which this document imposes no requirements", including "when this document omits any explicit definition of behavior".
Yes the behaviour is undefined.
int* (the return type of a.data()) is a different type from int(*)[8], so you are breaking strict aliasing rules.
Naturally though (and this is more for the benefit of future readers),
int* p = a.data();
is perfectly valid, as is the ensuing expression p + n where the integral type n is between 0 and 8 inclusive.
Try as I might, the closest answer I've seen is this, with two completely opposing answers(!)
The question is simple, is this legal?
auto p = reinterpret_cast<int*>(0xbadface);
*p; // legal?
My take on the matter
Casting integer to pointer: no restrictions on what may be casted
Indirection: only states the result is a lvalue.
Lifetimes: only states what can't be done on objects, there is no object here
Expression statements: *p is a discarded value expression
Discarded value expressions: no lvalue-to-rvalue conversion occurs
Undefined-ness of lvalues: aka strict aliasing rule, only if the lvalue is converted to a rvalue
So I conclude there is nothing explicitly saying this is undefined behaviour. Yet I distinctively remember that some platforms trap on indirection for invalid pointers. What went wrong with my reasoning?
[basic.compound] says:
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value ([conv.ptr]) for that type, or
an invalid pointer value.
By the process of elimination we can deduce that p is an invalid pointer value.
[basic.stc] says:
Indirection through an invalid pointer value and passing an invalid
pointer value to a deallocation function have undefined behavior. Any
other use of an invalid pointer value has implementation-defined
behavior.
As indirection operator is said to perform indirection by [expr.unary.op], I would say, that expression *p causes UB no matter if the result is used or not.
... some platforms trap on indirection for invalid pointers.
Most platforms trap on invalid address access. This does not contradict the issue in any way. The question of what happens in *p; boils down to whether an attempt to actually fetch at an invalid address takes place or not.
The question of fetching is very similar to the core issue 232 (indirection through a null pointer). As you have already pointed out, *p; is a discarded value expression, and as such no lvalue-to-rvalue conversion ("fetching") takes place:
Tom Plum:
...it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior.
And subsequently:
Notes from the October 2003 meeting:
We agreed that the approach in the standard seems okay: p = 0; *p; is
not inherently an error. An lvalue-to-rvalue conversion would give it
undefined behavior.
As to whether or not reinterpret_cast<int*>(0xbadface) produces a valid pointer, indeed in implementations with strict pointer safety, it wouldn't be a safely-derived pointer, and as such is invalid and any use of it is UB.
But in case of relaxed pointer safety the resulting pointer is valid (otherwise it would be impossible to use pointers returned from binary libraries and components written in C or other languages).
Recently tried the following program and it compiles, runs fine and produces expected output instead of any runtime error.
#include <iostream>
class demo
{
public:
static void fun()
{
std::cout<<"fun() is called\n";
}
static int a;
};
int demo::a=9;
int main()
{
demo* d=nullptr;
d->fun();
std::cout<<d->a;
return 0;
}
If an uninitialized pointer is used to access class and/or struct members behaviour is undefined, but why it is allowed to access static members using null pointers also. Is there any harm in my program?
TL;DR: Your example is well-defined. Merely dereferencing a null pointer is not invoking UB.
There is a lot of debate over this topic, which basically boils down to whether indirection through a null pointer is itself UB.
The only questionable thing that happens in your example is the evaluation of the object expression. In particular, d->a is equivalent to (*d).a according to [expr.ref]/2:
The expression E1->E2 is converted to the equivalent form
(*(E1)).E2; the remainder of 5.2.5 will address only the first
option (dot).
*d is just evaluated:
The postfix expression before the dot or arrow is evaluated;65 the
result of that evaluation, together with the id-expression, determines
the result of the entire postfix expression.
65) If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary
to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.
Let's extract the critical part of the code. Consider the expression statement
*d;
In this statement, *d is a discarded value expression according to [stmt.expr]. So *d is solely evaluated1, just as in d->a.
Hence if *d; is valid, or in other words the evaluation of the expression *d, so is your example.
Does indirection through null pointers inherently result in undefined behavior?
There is the open CWG issue #232, created over fifteen years ago, which concerns this exact question. A very important argument is raised. The report starts with
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
Note that the example mentioned was changed to cover modifications of const objects instead, and the note in [dcl.ref] - while still existing - is not normative. The normative passage was removed to avoid commitment.
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
If the lvalue expression is obtained by applying the unary * operator
to a pointer and the pointer is a null pointer value (4.10
[conv.ptr]), the typeid expression throws the bad_typeid exception
(18.7.3 [bad.typeid]).
This is inconsistent and should be cleaned up.
The last point is especially important. The quote in [expr.typeid] still exists and appertains to glvalues of polymorphic class type, which is the case in the following example:
int main() try {
// Polymorphic type
class A
{
virtual ~A(){}
};
typeid( *((A*)0) );
}
catch (std::bad_typeid)
{
std::cerr << "bad_exception\n";
}
The behavior of this program is well-defined (an exception will be thrown and catched), and the expression *((A*)0) is evaluated as it isn't part of an unevaluated operand. Now if indirection through null pointers induced UB, then the expression written as
*((A*)0);
would be doing just that, inducing UB, which seems nonsensical when compared to the typeid scenario. If the above expression is merely evaluated as every discarded-value expression is1, where is the crucial difference that makes the evaluation in the second snippet UB? There is no existing implementation that analyzes the typeid-operand, finds the innermost, corresponding dereference and surrounds its operand with a check - there would be a performance loss, too.
A note in that issue then ends the short discussion with:
We agreed that the approach in the standard seems okay: p = 0; *p;
is not inherently an error. An lvalue-to-rvalue conversion would give
it undefined behavior.
I.e. the committee agreed upon this. Although the proposed resolution of this report, which introduced so-called "empty lvalues", was never adopted…
However, “not modifiable” is a compile-time concept, while in fact
this deals with runtime values and thus should produce undefined
behavior instead. Also, there are other contexts in which lvalues can
occur, such as the left operand of . or .*, which should also be
restricted. Additional drafting is required.
…that does not affect the rationale. Then again, it should be noted that this issue even precedes C++03, which makes it less convincing while we approach C++17.
CWG-issue #315 seems to cover your case as well:
Another instance to consider is that of invoking a member function
from a null pointer:
struct A { void f () { } };
int main ()
{
A* ap = 0;
ap->f ();
}
[…]
Rationale (October 2003):
We agreed the example should be allowed. p->f() is rewritten as
(*p).f() according to 5.2.5 [expr.ref]. *p is not an error when
p is null unless the lvalue is converted to an rvalue (4.1
[conv.lval]), which it isn't here.
According to this rationale, indirection through a null pointer per se does not invoke UB without further lvalue-to-rvalue conversions (=accesses to stored value), reference bindings, value computations or the like. (Nota bene: Calling a non-static member function with a null pointer should invoke UB, albeit merely hazily disallowed by [class.mfct.non-static]/2. The rationale is outdated in this respect.)
I.e. a mere evaluation of *d does not suffice to invoke UB. The identity of the object is not required, and neither is its previously stored value. On the other hand, e.g.
*p = 123;
is undefined since there is a value computation of the left operand, [expr.ass]/1:
In all cases, the assignment is sequenced after the value computation
of the right and left operands
Because the left operand is expected to be a glvalue, the identity of the object referred to by that glvalue must be determined as mentioned by the definition of evaluation of an expression in [intro.execution]/12, which is impossible (and thus leads to UB).
1 [expr]/11:
In some contexts, an expression only appears for its side effects.
Such an expression is called a discarded-value expression. The
expression is evaluated and its value is discarded. […]. The lvalue-to-rvalue conversion (4.1) is
applied if and only if the expression is a glvalue of
volatile-qualified type and […]
From the C++ Draft Standard N3337:
9.4 Static members
2 A static member s of class X may be referred to using the qualified-id expression X::s; it is not necessary to use the class member access syntax (5.2.5) to refer to a static member. A static member may be referred
to using the class member access syntax, in which case the object expression is evaluated.
And in the section about object expression...
5.2.5 Class member access
4 If E2 is declared to have type “reference to T,” then E1.E2 is an lvalue; the type of E1.E2 is T. Otherwise,
one of the following rules applies.
— If E2 is a static data member and the type of E2 is T, then E1.E2 is an lvalue; the expression designates the named member of the class. The type of E1.E2 is T.
Based on the last paragraph of the standard, the expressions:
d->fun();
std::cout << d->a;
work because they both designate the named member of the class regardless of the value of d.
runs fine and produces expected output instead of any runtime error.
That's a basic assumption error. What you are doing is undefined behavior, which means that your claim for any kind of "expected output" is faulty.
Addendum: Note that, while there is a CWG defect (#315) report that is closed as "in agreement" of not making the above UB, it relies on the positive closing of another CWG defect (#232) that is still active, and hence none of it is added to the standard.
Let me quote a part of a comment from James McNellis to an answer to a similar Stack Overflow question:
I don't think CWG defect 315 is as "closed" as its presence on the "closed issues" page implies. The rationale says that it should be allowed because "*p is not an error when p is null unless the lvalue is converted to an rvalue." However, that relies on the concept of an "empty lvalue," which is part of the proposed resolution to CWG defect 232, but which has not been adopted.
The expressions d->fun and d->a() both cause evaluation of *d ([expr.ref]/2).
The complete definition of the unary * operator from [expr.unary.op]/1 is:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
For the expression d there is no "object or function to which the expression points" . Therefore this paragraph does not define the behaviour of *d.
Hence the code is undefined by omission, since the behaviour of evaluating *d is not defined anywhere in the Standard.
What you are seeing here is what I would consider an ill-conceived and unfortunate design choice in the specification of the C++ language and many other languages that belong to the same general family of programming languages.
These languages allow you to refer to static members of a class using a reference to an instance of the class. The actual value of the instance reference is of course ignored, since no instance is required to access static members.
So, in d->fun(); the the compiler uses the d pointer only during compilation to figure out that you are referring to a member of the demo class, and then it ignores it. No code is emitted by the compiler to dereference the pointer, so the fact that it is going to be NULL during runtime does not matter.
So, what you see happening is in perfect accordance to the specification of the language, and in my opinion the specification suffers in this respect, because it allows an illogical thing to happen: to use an instance reference to refer to a static member.
P.S. Most compilers in most languages are actually capable of issuing warnings for that kind of stuff. I do not know about your compiler, but you might want to check, because the fact that you received no warning for doing what you did might mean that you do not have enough warnings enabled.
Is the following safe?
*(new int);
I get output as 0.
It’s undefined because you’re reading an object with an indeterminate value. The expression new int() uses zero-initialisation, guaranteeing a zero value, while new int (without parentheses) uses default-initialisation, giving you an indeterminate value. This is effectively the same as saying:
int x; // not initialised
cout << x << '\n'; // undefined value
But in addition, since you are immediately dereferencing the pointer to the object you just allocated, and do not store the pointer anywhere, this constitutes a memory leak.
Note that the presence of such an expression does not necessarily make a program ill-formed; this is a perfectly valid program, because it sets the value of the object before reading it:
int& x = *(new int); // x is an alias for a nameless new int of undefined value
x = 42;
cout << x << '\n';
delete &x;
This is undefined behavior(UB) since you are accessing an indeterminate value, C++14 clearly makes this undefined behavior. We can see that new without initializer is default initialized, from the draft C++14 standard section 5.3.4 New paragraph 17 which says (emphasis mine going forward):
If the new-initializer is omitted, the object is default-initialized
(8.5). [ Note: If no initialization is performed, the object has an
indeterminate value. —end note ]
for int this means an indeterminate value, from section 8.5 paragraph 7 which says:
To default-initialize an object of type T means:
— if T is a (possibly cv-qualified) class type (Clause 9), the default constructor (12.1) for T is called (and
the initialization is ill-formed if T has no default constructor or overload resolution (13.3) results in an
ambiguity or in a function that is deleted or inaccessible from the context of the initialization);
— if T is an array type, each element is default-initialized;
— otherwise, no initialization is performed.
we can see from section 8.5 that producing an indeterminate value is undefined:
If no initializer is specified for an object, the object is
default-initialized. When storage for an object with automatic or
dynamic storage duration is obtained, the object has an indeterminate
value, and if no initialization is performed for the object, that
object retains an indeterminate value until that value is replaced
(5.17). [ Note: Objects with static or thread storage duration are
zero-initialized, see 3.6.2. — end note
If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases
and all the exceptions have to do with unsigned narrow char which int is not.
Jon brings up an interesting example:
int& x = *(new int);
it may not be immediately obvious why this is not undefined behavior. The key point to notice is that is is undefined behavior to produce a value but in this case no value is produced. We can see this by going to section 8.5.3 References, which covers initialization of references and it says:
A reference to type “cv1 T1” is initialized by an expression of type “cv2 T2” as follows:
— If the reference is an lvalue reference and the initializer expression
— is an lvalue (but is not a bit-field), and “cv1 T1” is reference-compatible with “cv2 T2,” or
and goes on to say:
then the reference is bound to the initializer expression lvalue in
the first case [...][ Note: The usual lvalue-to-rvalue (4.1),
array-to-pointer (4.2), and function-to-pointer (4.3) standard
conversions are not needed, and therefore are suppressed, when such
direct bindings to lvalues are done. —end note ]
It is possible that a computer has "trapping" values of int: invalid values, such as a checksum bit which raises a hardware exception when it doesn't match its expected state.
In general, uninitialized values lead to undefined behavior. Initialize it first.
Otherwise, no, there's nothing wrong or really unusual about dereferencing a new-expression. Here is some odd, but entirely valid code using your construction:
int & ir = * ( new int ) = 0;
…
delete & ir;
First of all, Shafik Yaghmour gave references to the Standard in his answer. That is the best, complete and authoritative answer. None the less, let me try to give you specific examples that should illustrate the aforementioned points.
This code is safe, well-formed and meaningful:
int *p = new int; // ie this is a local variable (ptr) that points
// to a heap-allocated block
You must not, however, dereference the pointer as that results in undefined behavior. IE you may get 0x00, or 0xFFFFFFFF, or the instruction pointer (aka RIP register on Intel) may jump to a random location. The computer may crash.
int *p = new int;
std::cout << *p; // Very, bad. Undefined behavior.
Run-time checkers such as Valgrind and ASan will catch the issue, flag it and crash with a nice error message.
It is, however, perfectly fine to initialize the memory block you had allocated:
int *p = new int;
*p = 0;
Background info: this particular way of writing the specification is very useful for performance, as it is prohibitively expensive to implement the alternative.
Note, as per the Standard references, sometimes the initialization is cheap, so you can do the following:
// at the file scope
int global1; // zero-initialized
int global2 = 1; // explicitly initialized
void f()
{
std::cout << global1;
}
These things go into the executable's sections (.bss and .data) and are initialized by the OS loader.
Possible Duplicates:
C++: null reference
When does invoking a member function on a null instance result in undefined behavior?
Is there any section in the C++ standard the shows that NULL references are ill-formed?
I am trying to show my lecturer (this is for an assignment for which I am being graded) that the following expression is undefined behaviour:
AClass* ptr = 0;
AClass& ali = *ptr;
std::cout << "\n" << (AClass*) &ali << '\n';
The violations I see, is dereferencing of a null pointer, and then referencing a null reference. In an a program he is using as a correct example, he is comparing the return of the dereferenced pointer reference:
(AClass*) &ali != (AClass*) 0
As a test for an objects validity. I saw this as completely undefined behavior; I want to find a quote from the standard that is a bit more concrete for my explanation.
If I'm wrong, then please show where I have made an error.
§8.5.3/1: "A variable declared to be a T&, that is “reference to type T” (8.3.2), shall be initialized by an object, or function, of type T or by an object that can be converted into a T."
The code above does not initialize the reference with an object or function of type T or an object that can be converted to T. The violates the "shall". At that point, the only room for question is whether it's undefined behavior, or whether this qualifies as a diagnosable rule, in which case the compiler would be required to give an error message. Either way, it's clearly wrong though.
You should use pointers, and not references, if you wish to reassign. References are initialized once and for all and cannot be reassigned. Pointers can be created but left uninitialized, plus they can be reassigned.
8.3.2/1:
A reference shall be initialized to refer to a valid object or function.
[Note: in particular, a null reference
cannot exist in a well-defined program, because the only way to create such
a reference would be to bind it to the
“object” obtained by dereferencing a null pointer, which causes undefined
behavior. As described in 9.6, a reference cannot be bound directly to a bit-field]
1.9/4:
Certain other operations are described in this International Standard as undefined
(for example, the effect of dereferencing the null pointer)