Dereferencing an invalid pointer, then taking the address of the result - c++

Consider:
int* ptr = (int*)0xDEADBEEF;
cout << (void*)&*ptr;
How illegal is the *, given that it's used in conjunction with an immediate & and given that there are no overloaded op&/op* in play?
(This has particular ramifications for addressing a past-the-end array element &myArray[n], an expression which is explicitly equivalent to &*(myArray+n). This Q&A addresses the wider case but I don't feel that it ever really satisfied the above question.)

According to the specification, the effect of dereferencing an invalid pointer itself produces undefined behaviour. It doesn't matter what you do after dereferencing it.

Assuming the variable `ptr' does not contain a pointer to a valid object, the undefined behavior occurs if the program necessitates the lvalue-to-rvalue conversion of the expression `*ptr', as specified in [conv.lval] (ISO/IEC 14882:2011, page 82, 4.1 [#1]).
During the evaluation of `&*ptr' the program does not necessitate the lvalue-to-rvalue conversion of the subexpression `*ptr', according to [expr.unary.op] (ISO/IEC 14882:2011, page 109, 5.3.1 [#3])
Hence, it is legal.

It is legal. Why wouldn't it be? You're just setting a value to a pointer, and then accessing to it. However, assigning the value by hand must be obviously specified as undefined behavior, but that's the most a general specification can say. Then, you use it in some embedded software controller, and it will give you the correct memory-mapped value for some device...

Related

Why is dereferencing of nullptr while using a static method not undefined behaviour in C++?

I was reading a post on some nullptr peculiarities in C++, and a particular example caused some confusion in my understanding.
Consider (simplified example from the aforementioned post):
struct A {
void non_static_mem_fn() {}
static void static_mem_fn() {}
};
A* p{nullptr};
/*1*/ *p;
/*6*/ p->non_static_mem_fn();
/*7*/ p->static_mem_fn();
According to the authors, expression /*1*/ that dereferences the nullptr does not cause undefined behaviour by itself. Same with expression /*7*/ that uses the nullptr-object to call a static function.
The justification is based on issue 315 in C++ Standard Core Language Closed Issues, Revision 100 that has
...*p is not an error when p is null unless the lvalue is converted to an rvalue (7.1 [conv.lval]), which it isn't here.
thus making a distinction between /*6*/ and /*7*/.
So, the actual dereferencing of the nullptr is not undefined behaviour (answer on SO, discussion under issue 232 of C++ Standard, ...). Thus, the validity of /*1*/ is understandable under this assumption.
However, how is /*7*/ guaranteed to not cause UB? As per the cited quote, there is no conversion of lvalue to rvalue in p->static_mem_fn();. But the same is true for /*6*/ p->non_static_mem_fn();, and I think my guess is confirmed by the quote from the same issue 315 regarding:
/*6*/ is explicitly noted as undefined in 12.2.2
[class.mfct.non-static], even though one could argue that since non_static_mem_fn(); is
empty, there is no lvalue->rvalue conversion.
(in the quote, I changed "which" and f() to get the connection to the notation used in this question).
So, why is such a distinction made for p->static_mem_fn(); and p->non_static_mem_fn(); regarding the causality of UB? Is there an intended use of calling static functions from pointers that could potentially be nullptr?
Appendix:
this question asks about why dereferencing a nullptr is undefined behaviour. While I agree that in most cases it is a bad idea, I do not believe the statement is absolutely correct as per the links and quotes here.
similar discussion in this Q/A with some links to issue 232.
I was not able to find a question devoted to static methods and the nullptr dereferencing issue. Maybe I missed some obvious answer.
Standard citations in this answer are from the C++17 spec (N4713).
One of the sections cited in your question answers the question for non-static member functions. [class.mfct.non-static]/2:
If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.
This applies to, for example, accessing an object through a different pointer type:
std::string foo;
A *ptr = reinterpret_cast<A *>(&foo); // not UB by itself
ptr->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
A null pointer doesn't point at any valid object, so it certainly doesn't point to an object of type A either. Using your own example:
p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
With that out of the way, why does this work in the static case? Let's pull together two parts of the standard:
[expr.ref]/2:
... The expression E1->E2 is converted to the equivalent form (*(E1)).E2 ...
[class.static]/1 (emphasis mine):
... A static member may be referred to using the class member access syntax, in which case the object expression is evaluated.
The second block, in particular, says that the object expression is evaluated even for static member access. This is important if, for example, it is a function call with side effects.
Put together, this implies that these two blocks are equivalent:
// 1
p->static_mem_fn();
// 2
*p;
A::static_mem_fn();
So the final question to answer is whether *p alone is undefined behavior when p is a null pointer value.
Conventional wisdom would say "yes" but this is not actually true. There is nothing in the standard that states dereferencing a null pointer alone is UB and there are several discussions that directly support this:
Issue 315, as you have mentioned in your question, explicitly states that *p is not UB when the result is unused.
DR 1102 removes "dereferencing the null pointer" as an example of UB. The given rationale is:
There are core issues surrounding the undefined behavior of dereferencing a null pointer. It appears the intent is that dereferencing is well defined, but using the result of the dereference will yield undefined behavior. This topic is too confused to be the reference example of undefined behavior, or should be stated more precisely if it is to be retained.
This DR links to issue 232 where it is discussed to add wording that explicitly indicates *p as defined behavior when p is a null pointer, as long as the result is not used.
In conclusion:
p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
p->static_mem_fn(); // Defined behavior per issue 232 and 315.

Is the use of "new" necessary when creating a pointer in C++? [duplicate]

So far I can't find how to deduce that the following:
int* ptr;
*ptr = 0;
is undefined behavior.
First of all, there's 5.3.1/1 that states that * means indirection which converts T* to T. But this doesn't say anything about UB.
Then there's often quoted 3.7.3.2/4 saying that using deallocation function on a non-null pointer renders the pointer invalid and later usage of the invalid pointer is UB. But in the code above there's nothing about deallocation.
How can UB be deduced in the code above?
Section 4.1 looks like a candidate (emphasis mine):
An lvalue (3.10) of a
non-function, non-array type T can be
converted to an rvalue. If T is an
incomplete type, a program that
necessitates this conversion is
ill-formed. If the object to which the
lvalue refers is not an object of type
T and is not an object of a type
derived from T, or if the object is
uninitialized, a program that
necessitates this conversion has
undefined behavior. If T is a
non-class type, the type of the rvalue
is the cv-unqualified version of T.
Otherwise, the type of the rvalue is
T.
I'm sure just searching on "uninitial" in the spec can find you more candidates.
I found the answer to this question is a unexpected corner of the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):
[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.
and:
An invalid iterator is an iterator that may be singular.268
and footnote 268 says:
This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.
Although it does look like there is some controversy over whether a null pointer is singular or not and it looks like the term singular value needs to be properly defined in a more general manner.
The intent of singular is seems to be summed up well in defect report 278. What does iterator validity mean? under the rationale section which says:
Why do we say "may be singular", instead of "is singular"? That's becuase a valid iterator is one that is known to be nonsingular. Invalidating an iterator means changing it in such a way that it's no longer known to be nonsingular. An example: inserting an element into the middle of a vector is correctly said to invalidate all iterators pointing into the vector. That doesn't necessarily mean they all become singular.
So invalidation and being uninitialized may create a value that is singular but since we can not prove they are nonsingular we must assume they are singular.
Update
An alternative common sense approach would be to note that the draft standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]
and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):
An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]
but ptr will not, except by chance, point to a valid object.
The OP's question is nonsense. There is no requirement that the Standard say certain behaviours are undefined, and indeed I would argue that all such wording be removed from the Standard because it confuses people and makes the Standard more verbose than necessary.
The Standard defines certain behaviour. The question is, does it specify any behaviour in this case? If it does not, the behaviour is undefined whether or not it says so explicitly.
In fact the specification that some things are undefined is left in the Standard primarily as a debugging aid for the Standards writers, the idea being to generate a contradiction if there is a requirement in one place which conflicts with an explicit statement of undefined behaviour in another: that's a way to prove a defect in the Standard. Without the explicit statement of undefined behaviour, the other clause prescribing behaviour would be normative and unchallenged.
Evaluating an uninitialized pointer causes undefined behaviour. Since dereferencing the pointer first requires evaluating it, this implies that dereferencing also causes undefined behaviour.
This was true in both C++11 and C++14, although the wording changed.
In C++14 it is fully covered by [dcl.init]/12:
When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.
If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
where the "following cases" are particular operations on unsigned char.
In C++11, [conv.lval/2] covered this under the lvalue-to-rvalue conversion procedure (i.e. retrieving the pointer value from the storage area denoted by ptr):
A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not
an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
The bolded part was removed for C++14 and replaced with the extra text in [dcl.init/12].
I'm not going to pretend I know a lot about this, but some compilers would initialize the pointer to NULL and dereferencing a pointer to NULL is UB.
Also considering that uninitialized pointer could point to anything (this includes NULL) you could concluded that it's UB when you dereference it.
A note in section 8.3.2 [dcl.ref]
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bitfield. ]
—ISO/IEC 14882:1998(E), the ISO C++ standard, in section 8.3.2 [dcl.ref]
I think I should have written this as comment instead, I'm not really that sure.
To dereference the pointer, you need to read from the pointer variable (not talking about the object it points to). Reading from an uninitialized variable is undefined behaviour.
What you do with the value of pointer after you have read it, doesn't matter anymore at this point, be it writing to (like in your example) or reading from the object it points to.
Even if the normal storage of something in memory would have no "room" for any trap bits or trap representations, implementations are not required to store automatic variables the same way as static-duration variables except when there is a possibility that user code might hold a pointer to them somewhere. This behavior is most visible with integer types. On a typical 32-bit system, given the code:
uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
uint16_t a;
if (q & 1) a=foo();
if (q & 2) a=bar();
return a;
}
unsigned short test(void)
{
return blah(65540);
}
it would not be particularly surprising for test to yield 65540 even though that value is outside the representable range of uint16_t, a type which has no trap representations. If a local variable of type uint16_t holds Indeterminate Value, there is no requirement that reading it yield a value within the range of uint16_t. Since unexpected behaviors could result when using even unsigned integers in such fashion, there's no reason to expect that pointers couldn't behave in even worse fashion.

Unaligned access through reinterpret_cast

I'm in the middle of a discussion trying to figure out whether unaligned access is allowable in C++ through reinterpret_cast. I think not, but I'm having trouble finding the right part(s) of the standard which confirm or refute that. I have been looking at C++11, but I would be okay with another version if it is more clear.
Unaligned access is undefined in C11. The relevant part of the C11 standard (§ 6.3.2.3, paragraph 7):
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
Since the behavior of an unaligned access is undefined, some compilers (at least GCC) take that to mean that it is okay to generate instructions which require aligned data. Most of the time the code still works for unaligned data because most x86 and ARM instructions these days work with unaligned data, but some don't. In particular, some vector instructions don't, which means that as the compiler gets better at generating optimized instructions code which worked with older versions of the compiler may not work with newer versions. And, of course, some architectures (like MIPS) don't do as well with unaligned data.
C++11 is, of course, more complicated. § 5.2.10, paragraph 7 says:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.
Note that the last word is "unspecified", not "undefined". § 1.3.25 defines "unspecified behavior" as:
behavior, for a well-formed program construct and correct data, that depends on the implementation
[Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this International Standard. — end note]
Unless I'm missing something, the standard doesn't actually delineate the range of possible behaviors in this case, which seems to indicate to me that one very reasonable behavior is that which is implemented for C (at least by GCC): not supporting them. That would mean the compiler is free to assume unaligned accesses do not occur and emit instructions which may not work with unaligned memory, just like it does for C.
The person I'm discussing this with, however, has a different interpretation. They cite § 1.9, paragraph 5:
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
Since there is no undefined behavior, they argue that the C++ compiler has no right to assume unaligned access don't occur.
So, are unaligned accesses through reinterpret_cast safe in C++? Where in the specification (any version) does it say?
Edit: By "access", I mean actually loading and storing. Something like
void unaligned_cp(void* a, void* b) {
*reinterpret_cast<volatile uint32_t*>(a) =
*reinterpret_cast<volatile uint32_t*>(b);
}
How the memory is allocated is actually outside my scope (it is for a library which can be called with data from anywhere), but malloc and an array on the stack are both likely candidates. I don't want to place any restrictions on how the memory is allocated.
Edit 2: Please cite sources (i.e., the C++ standard, section and paragraph) in answers.
Looking at 3.11/1:
Object types have alignment requirements (3.9.1, 3.9.2) which place restrictions on the addresses at which an object of that type may be allocated.
There's some debate in comments about exactly what constitutes allocating an object of a type. However I believe the following argument works regardless of how that discussion is resolved:
Take *reinterpret_cast<uint32_t*>(a) for example. If this expression does not cause UB, then (according to the strict aliasing rule) there must be an object of type uint32_t (or int32_t) at the given location after this statement. Whether the object was already there, or this write created it, does not matter.
According to the above Standard quote, objects with alignment requirement can only exist in a correctly aligned state.
Therefore any attempt to create or write an object that is not correctly aligned causes UB.
EDIT This answers the OP's original question, which was "is accessing a misaligned pointer safe". The OP has since edited their question to "is dereferencing a misaligned pointer safe", a far more practical and less interesting question.
The round-trip cast result of the pointer value is unspecified under those circumstances. Under certain limited circumstances (involving alignment), converting a pointer to A to a pointer to B, and then back again, results in the original pointer, even if you didn't have a B in that location.
If the alignment requirements are not met, than that round trip -- the pointer-to-A to pointer-to-B to pointer-to-A results in a pointer with an unspecified value.
As there are invalid pointer values, dereferencing a pointer with an unspecified value can result in undefined behavior. It is no different than *(int*)0xDEADBEEF in a sense.
Simply storing that pointer is not, however, undefined behavior.
None of the above C++ quotes talk about actually using a pointer-to-A as a pointer-to-B. Using a pointer to the "wrong type" in all but a very limited number of circumstances is undefined behavior, period.
An example of this involves creating a std::aligned_storage_t<sizeof(T), alignof(T)>. You can construct your T in that spot, and it will live their happily, even though it "actually" is an aligned_storage_t<sizeof(T), alignof(T)>. (You may, however, have to use the pointer returned from the placement new for full standard compliance; I am uncertain. See strict aliasing.)
Sadly, the standard is a bit lacking in terms of what object lifetime is. It refers to it, but does not define it well enough last I checked. You can only use a T at a particular location while a T lives there, but what that means is not made clear in all circumstances.
All of your quotes are about the pointer value, not the act of dereferencing.
5.2.10, paragraph 7 says that, assuming int has a stricter alignment than char, then the round trip of char* to int* to char* generates an unspecified value for the resulting char*.
On the other hand, if you convert int* to char* to int*, you are guaranteed to get back the exact same pointer as you started with.
It doesn't talk about what you get when you dereference said pointer. It simply states that in one case, you must be able to round trip. It washes its hands of the other way around.
Suppose you have some ints, and alignof(int) > 1:
int some_ints[3] ={0};
then you have an int pointer that is offset:
int* some_ptr = (int*)(((char*)&some_ints[0])+1);
We'll presume that copying this misaligned pointer doesn't cause undefined behavior for now.
The value of some_ptr is not specified by the standard. We'll be generous and presume it actually points to some chunk of bytes within some_bytes.
Now we have a int* that points to somewhere an int cannot be allocated (3.11/1). Under (3.8) the use of a pointer to an int is restricted in a number of ways. Usual use is restricted to a pointer to an T whose lifetime has begun allocated properly (/3). Some limited use is permitted on a pointer to a T which has been allocated properly, but whose lifetime has not begun (/5 and /6).
There is no way to create an int object that does not obey the alignment restrictions of int in the standard.
So the theoretical int* which claims to point to a misaligned int does not point to an int. No restrictions are placed on the behavior of said pointer when dereferenced; usual dereferencing rules provide behavior of a valid pointer to an object (including an int) and how it behaves.
And now our other assumptions. No restrictions on the value of some_ptr here are made by the standard: int* some_ptr = (int*)(((char*)&some_ints[0])+1);.
It is not a pointer to an int, much like (int*)nullptr is not a pointer to an int. Round tripping it back to a char* results in a pointer with unspecified value (it could be 0xbaadf00d or nullptr) explicitly in the standard.
The standard defines what you must do. There are (nearly? I guess evaluating it in a boolean context must return a bool) no requirements placed on the behavior of some_ptr by the standard, other than converting it back to char* results in an unspecified value (of the pointer).

Is dereferencing a pointer that's equal to nullptr undefined behavior by the standard?

An blog author has brought up the discussion about null pointer dereferecing:
http://www.viva64.com/en/b/0306/
I've put some counter arguments here:
http://bit.ly/1L98GL4
His main line of reasoning quoting the standard is this:
The '&podhd->line6' expression is undefined behavior in the C language
when 'podhd' is a null pointer.
The C99 standard says the following about the '&' address-of operator
(6.5.3.2 "Address and indirection operators"):
The operand of the unary & operator shall be either a function
designator, the result of a [] or unary * operator, or an lvalue that
designates an object that is not a bit-field and is not declared with
the register storage-class specifier.
The expression 'podhd->line6' is clearly not a function designator,
the result of a [] or * operator. It is an lvalue expression. However,
when the 'podhd' pointer is NULL, the expression does not designate an
object since 6.3.2.3 "Pointers" says:
If a null pointer constant is converted to a pointer type, the
resulting pointer, called a null pointer, is guaranteed to compare
unequal to a pointer to any object or function.
When "an lvalue does not designate an object when it is evaluated, the
behavior is undefined" (C99 6.3.2.1 "Lvalues, arrays, and function
designators"):
An lvalue is an expression with an object type or an incomplete type
other than void; if an lvalue does not designate an object when it is
evaluated, the behavior is undefined.
So, the same idea in brief:
When -> was executed on the pointer, it evaluated to an lvalue where
no object exists, and as a result the behavior is undefined.
This question is purely language based, I'm not asking regarding whether a given system allows one to tamper with what lies at address 0 in any language.
As far as I can see, there's no restriction in dereferencing a pointer variable whose value is equal to nullptr, even thought comparisons of a pointer against the nullptr (or (void *) 0) constant can vanish in optimizations in certain situations because of the stated paragraphs, but this looks like another issue, it doesn't prevent dereferencing a pointer whose value is equal to nullptr. Notice that I've checked other SO questions and answers, I particularly like this set of quotations, as well as the standard quotes above, and I didn't stumbled upon something that clearly infers from standard that if a pointer ptr compares equal to nullptr, dereferencing it would be undefined behavior.
At most what I get is that deferencing the constant (or its cast to any pointer type) is what is UB, but nothing saying about a variable that's bit equal to the value that comes up from nullptr.
I'd like to clearly separate the nullptr constant from a pointer variable that holds a value equals to it. But an answer that address both cases is ideal.
I do realise that optimizations can quick in when there're comparisons against nullptr, etc and may simply strip code based on that.
If the conclusion is that, if ptr equals to the value of nullptr dereferencing it is definitely UB, another question follows:
Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?
As you quote C, dereferencing a null pointer is clearly undefined behavior from this Standard quote (emphasis mine):
(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the
behavior of the unary * operator is undefined.102)"
102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."
Exact same quote in C99 and similar in C89 / C90.
C++
dcl.ref/5.
There shall be no references to references, no arrays of references, and no pointers to references. The
declaration of a reference shall contain an initializer (8.5.3) except when the declaration contains an explicit
extern specifier (7.1.1), is a class member (9.2) declaration within a class definition, or is the declaration
of a parameter or a return type (8.3.5); see 3.1. A reference shall be initialized to refer to a valid object or
function. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way
to create such a reference would be to bind it to the “object” obtained by indirection through a null pointer,
which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field.
— end note ]
The note is of interest, as it explicitly says dereferencing a null pointer is undefined.
I'm sure it says it somewhere else in a more relevant context, but this is good enough.
The answer to this that I see, as to what degree a NULL value may be dereferenced, is it is deliberately left platform-dependent in an unspecified manner, due to what is left implementation-defined in C11 6.3.2.3p5 and p6. This is mostly to support freestanding implementations used for developing boot code for a platform, as OP indicates in his rebuttal link, but has applications for a hosted implementation too.
Re:
(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)"
102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."
This is phrased as it is, afaict, because each of the cases in the footnote may NOT be invalid for specific platforms a compiler is targeting. If there's a defect there, it's "invalid value" should be italicized and qualified by "implementation-defined". For the alignment case a platform may be able to access any type using any address so has no alignment requirements, especially if address rollover is supported; and a platform may assume an object's lifetime only ends after the application has exited, allocating a new frame via malloc() for automatic variables on each function call.
For null pointers, at boot time a platform may have expectations that structures the processor uses have specific physical addresses, including at address 0, and get represented as object pointers in source code, or may require the function defining the boot process to use a base address of 0. If the standard didn't permit dereferences like '&podhd->line6', where a platform required podhd to have a base address of 0, then assembly language would be needed to access that structure. Similarly, a soft reboot function might need to dereference a 0 valued pointer as a void function invocation. A hosted implementation may consider 0 the base of an executable image, and map a NULL pointer in source code to the header of that image, after loading, as the struct required to be at logical address 0 for that instance of the C virtual machine.
What the standard calls pointers are more handles into the virtual address space of the virtual machine, where object handles have more requirements on what operations are permitted for them. How the compiler emits code that takes the requirements of these handles into account for a specific processor is left undefined. What is efficient for one processor may not be for another, after all.
The requirement on (void *)0 is more that the compiler emit code that guarantees expressions where the source uses (void *)0, explicitly or by referencing NULL, that the actual value stored will be one that says this can't point to any valid function definitions or objects by any mapping code. This does not have to be a 0! Similarly, for casts of (void *)0 to (obj_type) and (func_type), these are only required to get assigned values that evaluate as addresses the compiler guarantees are not being used then for objects or code. The difference with the latter is these are unused, not invalid, so are capable of being dereferenced in the defined manner.
The code that tests for pointer equality would then check if one operand is one of these values that the other is one of the 3, not just the same bit pattern, because this scoreboards them with the RTTI of being a (null *) type, distinct from void, obj, and func pointer types to defined entities. The standard could be more explicit it is a distinct type, if unnamed because compilers only use it internally, but I suppose this is considered obvious by "null pointer" being italicized. Effectively, imo, a '0' in these contexts is an additional keyword token of the compiler, due to the additional requirement of it identifying the (null *) type, but isn't characterized as such because this would complicate the definition of < identifiers >.
This stored value can be SIZE_MAX as easily as a 0, for a (void *)0, in emitted application code when implementations, for example, define the range 0 to SIZE_MAX-4*sizeof(void *) of virtual machine handles as what is valid for code and data. The NULL macro may even be defined as(void *)SIZE_MAX, and it would be up to the compiler to figure out from context this has the same semantics as 0. The casting code is responsible for noting it is the chosen value, in pointer <--> pointer casts, and supply what is appropriate as an object or function pointer. Casts from pointer <--> integer, implicit or explicit, have similar check and supply requirements; especially in unions where a (u)intptr_t field overlays a (type *) field. Portable code can guard against compilers not doing this properly with an explicit *(ptr==NULL?(type *)0:ptr) expression.

Is calling a function on a NULL pointer undefined? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
When does invoking a member function on a null instance result in undefined behavior?
C++ standard: dereferencing NULL pointer to get a reference?
Say I have the class:
class A
{
public:
void foo() { cout << "foo"; }
};
and call foo like so:
A* a = NULL;
a->foo();
I suspect this invokes undefined behavior, since it's equivalent to (*a).foo() (or is it?), and dereferencing a NULL is UB, but I can't find the reference. Can anyone help me out? Or is it defined?
No, the function is not virtual. No, I'm not accessing any members.
EDIT: I voted to close this question but will not delete it as I couldn't find the duplicate myself, and I suspect this title might be easier to find by others.
I'm looking for the reference that says a->x is equivalent to (*a).x.
Here it is:
[C++11: 5.2.5/2]: For the first option (dot) the first expression shall have complete class type. For the second option (arrow) the first expression shall have pointer to complete class type. The expression E1->E2 is converted to the equivalent form (*(E1)).E2; the remainder of 5.2.5 will address only the first option (dot). In either case, the id-expression shall name a member of the class or of one of its base classes. [ Note: because the name of a class is inserted in its class scope (Clause 9), the name of a class is also considered a nested member of that class. —end note ] [ Note: 3.4.5 describes how names are looked up after the . and -> operators. —end note ]
There is no direct quotation for dereferencing a NULL pointer being UB, unfortunately. You may find more under this question: When does invoking a member function on a null instance result in undefined behavior?
I'm aware of at least one case where this idiom is not only allowed but relied upon: Microsoft's MFC class CWnd provides a member function GetSafeHwnd which tests if this==NULL and returns without accessing any member variables.
Of course there are plenty of people who would claim that MFC is a very bad example.
Regardless of whether the behavior is undefined or not, in practice it's not likely to behave badly. The compiler will treat a->foo() as A::foo(a) which does not do a dereference at the call site, as long as foo is not virtual.
Yes, that is UB as a has not been initialized to point to a valid memory location before it is dereferenced.
It is covered here: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
If the lvalue expression is obtained by applying the unary * operator
to a pointer and the pointer is a null pointer value (4.10
[conv.ptr]), the typeid expression throws the bad_typeid exception
(18.7.3 [bad.typeid]).
This is inconsistent and should be cleaned up.
Read more at the link if you want to learn more.