Why is address zero used for the null pointer? - c++

In C (or C++ for that matter), pointers are special if they have the value zero: I am adviced to set pointers to zero after freeing their memory, because it means freeing the pointer again isn't dangerous; when I call malloc it returns a pointer with the value zero if it can't get me memory; I use if (p != 0) all the time to make sure passed pointers are valid, etc.
But since memory addressing starts at 0, isn't 0 just as a valid address as any other? How can 0 be used for handling null pointers if that is the case? Why isn't a negative number null instead?
Edit:
A bunch of good answers. I'll summarize what has been said in the answers expressed as my own mind interprets it and hope that the community will correct me if I misunderstand.
Like everything else in programming it's an abstraction. Just a constant, not really related to the address 0. C++0x emphasizes this by adding the keyword nullptr.
It's not even an address abstraction, it's the constant specified by the C standard and the compiler can translate it to some other number as long as it makes sure it never equals a "real" address, and equals other null pointers if 0 is not the best value to use for the platform.
In case it's not an abstraction, which was the case in the early days, the address 0 is used by the system and off limits to the programmer.
My negative number suggestion was a little wild brainstorming, I admit. Using a signed integer for addresses is a little wasteful if it means that apart from the null pointer (-1 or whatever) the value space is split evenly between positive integers that make valid addresses and negative numbers that are just wasted.
If any number is always representable by a datatype, it's 0. (Probably 1 is too. I think of the one-bit integer which would be 0 or 1 if unsigned, or just the signed bit if signed, or the two bit integer which would be [-2, 1]. But then you could just go for 0 being null and 1 being the only accessible byte in memory.)
Still there is something that is unresolved in my mind. The Stack Overflow question Pointer to a specific fixed address tells me that even if 0 for null pointer is an abstraction, other pointer values aren't necessarily. This leads me to post another Stack Overflow question, Could I ever want to access the address zero?.

2 points:
only the constant value 0 in the source code is the null pointer - the compiler implementation can use whatever value it wants or needs in the running code. Some platforms have a special pointer value that's 'invalid' that the implementation might use as the null pointer. The C FAQ has a question, "Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types?", that points out several platforms that used this property of 0 being the null pointer in C source while represented differently at runtime. The C++ standard has a note that makes clear that converting "an integral constant expression with value zero always yields a null pointer, but converting other expressions that happen to have value zero need not yield a null pointer".
a negative value might be just as usable by the platform as an address - the C standard simply had to chose something to use to indicate a null pointer, and zero was chosen. I'm honestly not sure if other sentinel values were considered.
The only requirements for a null pointer are:
it's guaranteed to compare unequal to a pointer to an actual object
any two null pointers will compare equal (C++ refines this such that this only needs to hold for pointers to the same type)

Historically, the address space starting at 0 was always ROM, used for some operating system or low level interrupt handling routines, nowadays, since everything is virtual (including address space), the operating system can map any allocation to any address, so it can specifically NOT allocate anything at address 0.

IIRC, the "null pointer" value isn't guaranteed to be zero. The compiler translates 0 into whatever "null" value is appropriate for the system (which in practice is probably always zero, but not necessarily). The same translation is applied whenever you compare a pointer against zero. Because you can only compare pointers against each other and against this special-value-0, it insulates the programmer from knowing anything about the memory representation of the system. As for why they chose 0 instead of 42 or somesuch, I'm going to guess it's because most programmers start counting at 0 :) (Also, on most systems 0 is the first memory address and they wanted it to be convenient, since in practice translations like I'm describing rarely actually take place; the language just allows for them).

You must be misunderstanding the meaning of constant zero in pointer context.
Neither in C nor in C++ pointers can "have value zero". Pointers are not arithmetic objects. They canot have numerical values like "zero" or "negative" or anything of that nature. So your statement about "pointers ... have the value zero" simply makes no sense.
In C & C++ pointers can have the reserved null-pointer value. The actual representation of null-pointer value has nothing to do with any "zeros". It can be absolutely anything appropriate for a given platform. It is true that on most plaforms null-pointer value is represented physically by an actual zero address value. However, if on some platform address 0 is actually used for some purpose (i.e. you might need to create objects at address 0), the null-pointer value on such platform will most likely be different. It could be physically represented as 0xFFFFFFFF address value or as 0xBAADBAAD address value, for example.
Nevertheless, regardless of how the null-pointer value is respresented on a given platform, in your code you will still continue to designate null-pointers by constant 0. In order to assign a null-pointer value to a given pointer, you will continue to use expressions like p = 0. It is the compiler's responsibility to realize what you want and translate it into the proper null-pointer value representation, i.e. to translate it into the code that will put the address value of 0xFFFFFFFF into the pointer p, for example.
In short, the fact that you use 0 in your sorce code to generate null-pointer values does not mean that the null-pointer value is somehow tied to address 0. The 0 that you use in your source code is just "syntactic sugar" that has absolutely no relation to the actual physical address the null-pointer value is "pointing" to.

But since memory addressing starts at 0, isn't 0 just as a valid address as any other?
On some/many/all operating systems, memory address 0 is special in some way. For example, it's often mapped to invalid/non-existent memory, which causes an exception if you try to access it.
Why isn't a negative number null instead?
I think that pointer values are typically treated as unsigned numbers: otherwise for example a 32-bit pointer would only be able to address 2 GB of memory, instead of 4 GB.

My guess would be that the magic value 0 was picked to define an invalid pointer since it could be tested for with less instructions. Some machine languages automatically set the zero and sign flags according to the data when loading registers so you could test for a null pointer with a simple load then and branch instructions without doing a separate compare instruction.
(Most ISAs only set flags on ALU instructions, not loads, though. And usually you aren't producing pointers via computation, except in the compiler when parsing C source. But at least you don't need an arbitrary pointer-width constant to compare against.)
On the Commodore Pet, Vic20, and C64 which were the first machines I worked on, RAM started at location 0 so it was totally valid to read and write using a null pointer if you really wanted to.

I think it's just a convention. There must be some value to mark an invalid pointer.
You just lose one byte of address space, that should rarely be a problem.
There are no negative pointers. Pointers are always unsigned. Also if they could be negative your convention would mean that you lose half the address space.

Although C uses 0 to represent the null pointer, do keep in mind that the value of the pointer itself may not be a zero. However, most programmers will only ever use systems where the null pointer is, in fact, 0.
But why zero? Well, it's one address that every system shares. And oftentimes the low addresses are reserved for operating system purposes thus the value works well as being off-limits to application programs. Accidental assignment of an integer value to a pointer is as likely to end up zero as anything else.

Historically the low memory of an application was occupied by system resources. It was in those days that zero became the default null value.
While this is not necessarily true for modern systems, it is still a bad idea to set pointer values to anything but what memory allocation has handed you.

Regarding the argument about not setting a pointer to null after deleting it so that future deletes "expose errors"...
If you're really, really worried about this then a better approach, one that is guaranteed to work, is to leverage assert():
...
assert(ptr && "You're deleting this pointer twice, look for a bug?");
delete ptr;
ptr = 0;
...
This requires some extra typing, and one extra check during debug builds, but it is certain to give you what you want: notice when ptr is deleted 'twice'. The alternative given in the comment discussion, not setting the pointer to null so you'll get a crash, is simply not guaranteed to be successful. Worse, unlike the above, it can cause a crash (or much worse!) on a user if one of these "bugs" gets through to the shelf. Finally, this version lets you continue to run the program to see what actually happens.
I realize this does not answer the question asked, but I was worried that someone reading the comments might come to the conclusion that it is considered 'good practice' to NOT set pointers to 0 if it is possible they get sent to free() or delete twice. In those few cases when it is possible it is NEVER a good practice to use Undefined Behavior as a debugging tool. Nobody that's ever had to hunt down a bug that was ultimately caused by deleting an invalid pointer would propose this. These kinds of errors take hours to hunt down and nearly alway effect the program in a totally unexpected way that is hard to impossible to track back to the original problem.

An important reason why many operating systems use all-bits-zero for the null pointer representation, is that this means memset(struct_with_pointers, 0, sizeof struct_with_pointers) and similar will set all of the pointers inside struct_with_pointers to null pointers. This is not guaranteed by the C standard, but many, many programs assume it.

In one of the old DEC machines (PDP-8, I think), the C runtime would memory protect the first page of memory so that any attempt to access memory in that block would cause an exception to be raised.

The choice of sentinel value is arbitrary, and this is in fact being addressed by the next version of C++ (informally known as "C++0x", most likely to be known in the future as ISO C++ 2011) with the introduction of the keyword nullptr to represent a null valued pointer. In C++, a value of 0 may be used as an initializing expression for any POD and for any object with a default constructor, and it has the special meaning of assigning the sentinel value in the case of a pointer initialization. As for why a negative value was not chosen, addresses usually range from 0 to 2N-1 for some value N. In other words, addresses are usually treated as unsigned values. If the maximum value were used as the sentinel value, then it would have to vary from system to system depending on the size of memory whereas 0 is always a representable address. It is also used for historical reasons, as memory address 0 was typically unusable in programs, and nowadays most OSs have parts of the kernel loaded into the lower page(s) of memory, and such pages are typically protected in such a way that if touched (dereferenced) by a program (save the kernel) will cause a fault.

It has to have some value. Obviously you don't want to step on values the user might legitimately want to use. I would speculate that since the C runtime provides the BSS segment for zero-initialized data, it makes a certain degree of sense to interpret zero as an un-initialized pointer value.

Rarely does an OS allow you to write to address 0. It's common to stick OS-specific stuff down in low memory; namely, IDTs, page tables, etc. (The tables have to be in RAM, and it's easier to stick them at the bottom than to try and determine where the top of RAM is.) And no OS in its right mind will let you edit system tables willy-nilly.
This may not have been on K&R's minds when they made C, but it (along with the fact that 0==null is pretty easy to remember) makes 0 a popular choice.

The value 0 is a special value that takes on various meanings in specific expressions. In the case of pointers, as has been pointed out many many times, it is used probably because at the time it was the most convenient way of saying "insert the default sentinel value here." As a constant expression, it does not have the same meaning as bitwise zero (i.e., all bits set to zero) in the context of a pointer expression. In C++, there are several types that do not have a bitwise zero representation of NULL such as pointer member and pointer to member function.
Thankfully, C++0x has a new keyword for "expression that means a known invalid pointer that does not also map to bitwise zero for integral expressions": nullptr. Although there are a few systems that you can target with C++ that allow dereferencing of address 0 without barfing, so programmer beware.

There are already a lot of good answers in this thread; there are probably many different reasons for preferring the value 0 for null pointers, but I'm going to add two more:
In C++, zero-initializing a pointer will set it to null.
On many processors it is more efficient to set a value to 0 or to test for it equal/not equal to 0 than for any other constant.

This is dependent on the implementation of pointers in C/C++. There is no specific reason why NULL is equivalent in assignments to a pointer.

Null pointer is not the same thing with null value. For example the same strchr function of c will return a null pointer (empty on the console), while passing the value would return (null) on the console.
True function:
char *ft_strchr(const char *s, int c)
{
int i;
if (!s)
return (NULL);
i = 0;
while (s[i])
{
if (s[i] == (char)c)
return ((char*)(s + i));
i++;
}
**if (s[i] == (char)c)
return ((char*)(s + i));**
return (NULL);
}
This will produce empty thing as the output: the last || is the output.
While passing as value like s[i] gives us a NULL like: enter image description here
char *ft_strchr(const char *s, int c)
{
int i;
if (!s)
return (NULL);
i = 0;
while (s[i])
{
if (s[i] == (char)c)
return ((char*)(s + i));
i++;
}
**if (s[i] == (char)c)
return (s[i]);**
return (NULL);
}

There are historic reasons for this, but there are also optimization reasons for it.
It is common for the OS to provide a process with memory pages initialized to 0. If a program wants to interpret part of that memory page as a pointer then it is 0, so it is easy enough for the program to determine that that pointer is not initialized. (this doesn't work so well when applied to uninitialized flash pages)
Another reason is that on many many processors it is very very easy to test a value's equivalence to 0. It is sometimes a free comparison done without any extra instructions needed, and usually can be done without needing to provide a zero value in another register or as a literal in the instruction stream to compare to.
The cheap comparisons for most processors are the signed less than 0, and equal to 0. (signed greater than 0 and not equal to 0 are implied by both of these)
Since 1 value out of all of possible values needs to be reserved as bad or uninitialized then you might as well make it the one that has the cheapest test for equivalence to the bad value. This is also true for '\0' terminated character strings.
If you were to try to use greater or less than 0 for this purpose then you would end up chopping your range of addresses in half.

The constant 0 is used instead of NULL because C was made by some cavemen trillions of years ago, NULL, NIL, ZIP, or NADDA would have all made much more sense than 0.
But since memory addressing starts at
0, isn't 0 just as a valid address as
any other?
Indeed. Although a lot of operating systems disallow you from mapping anything at address zero, even in a virtual address space (people realized C is an insecure language, and reflecting that null pointer dereference bugs are very common, decided to "fix" them by dissallowing the userspace code to map to page 0; Thus, if you call a callback but the callback pointer is NULL, you wont end up executing some arbitrary code).
How can 0 be used for handling null
pointers if that is the case?
Because 0 used in comparison to a pointer will be replaced with some implementation specific value, which is the return value of malloc on a malloc failure.
Why isn't a negative number null
instead?
This would be even more confusing.

Related

For a pointer p, could p < p+1 be false in an extreme case?

Is it possible, for a pointer variable p, that p<(p+1) is false? Please explain your answer. If yes, under which circumstances can this happen?
I was wondering whether p+1 could overflow and be equal to 0.
E.g. On a 64-bit PC with GCC-4.8 for a C-language program:
int main(void) {
void *p=(void *)0xFFFFFFFFFFFFFFFF;
printf("p :%p\n", p);
printf("p+1 :%p\n", p+1);
printf("Result :%d\n", p<p+1);
}
It returns:
p : 0xffffffffffffffff
p+1 : (nil)
Result : 0
So I believe it is possible for this case. For an invalid pointer location it can happen.
This is the only solution I can think of. Are there others?
Note:
No assumptions are made. Consider any compiler/platform/architecture/OS where there is a chance that this can happen or not.
Is it possible, for a pointer variable p, that p<(p+1) is false?
If p points to a valid object (that is, one created according to the C++ object model) of the correct type, then no. p+1 will point to the memory location after that object, and will always compare greater than p.
Otherwise, the behaviour of both the arithmetic and the comparison are undefined, so the result could be true, false, or a suffusion of yellow.
If yes, under which circumstances can this happen?
It might, or might not, happen with
p = reinterpret_cast<char*>(numeric_limits<uintptr_t>::max);
If pointer arithmetic works like unsigned integer arithmetic, then this might cause a numeric overflow such that p+1 has the value zero, and compares less than p. Or it might do something else.
What if I'm programming on DOS, and I have a far pointer (one composed of a segment and an offset), and it's pointing to the last address in the segment, and I add one to it, and the pointer wraps around? It looks like when you're comparing them, you normalize the pointers, so the second pointer p+1 would be less than p.
This is a stab in the dark though, I don't have a DOS C compiler handy to test on.
Very simple: It cannot happen if there is no undefined behaviour involved. It can happen very easily in the presence of undefined behaviour. For details, read a copy of the C Standard or C++ Standard.
As a result, a conforming compiler is allowed to not evaluate the < operator at all and use 1 or true as the result instead. The same is true for arithmetic with signed integers (but not for unsigned integers, where it is possible for entirely legal code to have x > x+1).
Your example code isn't even C or C++, so you seem to have used the compiler in a mode where it isn't a standard conforming C or C++ compiler.
It could happen with an invalid pointer.
But if the pointer points to a valid memory location, on many operating systems (e.g. Linux), it practically never happens (at least if the sizeof(*p) is not too big), because in practice the first and last pages of the address space are never mapped (but you could force a mapping with mmap & MAP_FIXED).
For freestanding implementations (i.e. inside a kernel, or on some microcontroller), things are different, and implementation specific (perhaps might be undefined behavior, or unspecified behavior).
According to Pointer comparisons in C. Are they signed or unsigned? on Stack Overflow:
You can't legally compare arbitrary pointers in C/C++. The result of such comparison is not defined.

Trouble reading line of code with reference & dereference operators

I'm having trouble reading through a series of * and & operators in order to understand two lies of code within a method. The lines are:
int dummy = 1;
if (*(char*)&dummy) { //Do stuff
}
As best I can determine:
dummy is allocated on the stack and its value is set to 1
&dummy returns the memory location being used by dummy (i.e. where the 1 is)
(char*)&dummy casts &dummy into a pointer to a char, instead of a pointer to an int
*(char*)&dummy dereferences (char*)&dummy, returning whatever char has a numeric value of 1
This seems like an awfully confusing way to say:
if (1){//Do stuuf }
Am I understanding these lines correctly? If so, why would someone do this (other than to confuse me)?
The code is certainly not portable but is apparently intended to detect the endianess of the system: where the non-zero bit for int(1) is located depends on whether the system is big or little endian. In one case the result of the expression is assumed to be 0, in the other case it is assumed to be non-zero. I think it is undefined behavior anyway, though. Also, in theory there is also DS9k endianess which entirely garbles the bytes up (although I don't think there is any system which actually does it).

Is the behavior of subtracting two NULL pointers defined?

Is the difference of two non-void pointer variables defined (per C99 and/or C++98) if they are both NULL valued?
For instance, say I have a buffer structure that looks like this:
struct buf {
char *buf;
char *pwrite;
char *pread;
} ex;
Say, ex.buf points to an array or some malloc'ed memory. If my code always ensures that pwrite and pread point within that array or one past it, then I am fairly confident that ex.pwrite - ex.pread will always be defined. However, what if pwrite and pread are both NULL. Can I just expect subtracting the two is defined as (ptrdiff_t)0 or does strictly compliant code need to test the pointers for NULL? Note that the only case I am interested in is when both pointers are NULL (which represents a buffer not initialized case). The reason has to do with a fully compliant "available" function given the preceding assumptions are met:
size_t buf_avail(const struct s_buf *b)
{
return b->pwrite - b->pread;
}
In C99, it's technically undefined behavior. C99 §6.5.6 says:
7) For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
[...]
9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [...]
And §6.3.2.3/3 says:
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
So since a null pointer is unequal to any object, it violates the preconditions of 6.5.6/9, so it's undefined behavior. But in practicality, I'd be willing to bet that pretty much every compiler will return a result of 0 without any ill side effects.
In C89, it's also undefined behavior, though the wording of the standard is slightly different.
C++03, on the other hand, does have defined behavior in this instance. The standard makes a special exception for subtracting two null pointers. C++03 §5.7/7 says:
If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type ptrdiff_t.
C++11 (as well as the latest draft of C++14, n3690) have identical wording to C++03, with just the minor change of std::ptrdiff_t in place of ptrdiff_t.
I found this in the C++ standard (5.7 [expr.add] / 7):
If two pointers [...] both are null, and the two pointers are
subtracted, the result compares equal to the value 0 converted to the
type std::ptrdiff_t
As others have said, C99 requires addition/subtraction between 2 pointers be of the same array object. NULL does not point to a valid object which is why you cannot use it in subtraction.
Edit: This answer is only valid for C, I didn't see the C++ tag when I answered.
No, pointer arithmetic is only allowed for pointers that point within the same object. Since by definition of the C standard null pointers don't point to any object, this is undefined behavior.
(Although, I'd guess that any reasonable compiler will return just 0 on it, but who knows.)
The C Standard does not impose any requirements on the behavior in this case, but many implementations do specify the behavior of pointer arithmetic in many cases beyond the bare minimums required by the Standard, including this one.
On any conforming C implementation, and nearly all (if not all) implementations of C-like dialects, the following guarantees will hold for any pointer p such that either *p or *(p-1) identifies some object:
For any integer value z that equals zero, The pointer values (p+z) and (p-z) will be equivalent in every way to p, except that they will only be constant if both p and z are constant.
For any q which is equivalent to p, the expressions p-q and q-p will both yield zero.
Having such guarantees hold for all pointer values, including null, may eliminate the need for some null checks in user code. Further, on most platforms, generating code that upholds such guarantees for all pointer values without regard for whether they are null would be simpler and cheaper than treating nulls specially. Some platforms, however, may trap on attempts to perform pointer arithmetic with null pointers, even when adding or subtracting zero. On such platforms, the number of compiler-generated null checks that would have to be added to pointer operations to uphold the guarantee would in many cases vastly exceed the number of user-generated null checks that could be omitted as a result.
If there were an implementation where the cost of upholding the guarantees would be great, but few if any programs would receive any benefit from them, it would make sense to allow it to trap "null+zero" computations, and require that user code for such an implementation include the manual null checks that the guarantees could have made unnecessary. Such an allowance was not expected to affect the other 99.44% of implementations, where the value of upholding the guarantees would exceed the cost. Such implementations should uphold such guarantees, but their authors shouldn't need the authors of the Standard to tell them that.
The authors of C++ have decided that conforming implementations must uphold the above guarantees at any cost, even on platforms where they could substantially degrade the performance of pointer arithmetic. They judged that the value of the guarantees even on platforms where they would be expensive to uphold would exceed the cost. Such an attitude may have been affected by a desire to treat C++ as a higher-level language than C. A C programmer could be expected to know when a particular target platform would handle cases like (null+zero) in unusual fashion, but C++ programmers weren't expected to concern themselves with such things. Guaranteeing a consistent behavioral model was thus judged to be worth the cost.
Of course, nowadays questions about what is "defined" seldom have anything to do with what behaviors a platform can support. Instead, it is now fashionable for compilers to--in the name of "optimization"--require that programmers manually write code to handle corner cases which platforms would previously have handled correctly. For example, if code which is supposed to output n characters starting at address p is written as:
void out_characters(unsigned char *p, int n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
older compilers would generate code that would reliably output nothing, with
no side-effect, if p==NULL and n==0, with no need to special-case n==0. On
newer compilers, however, one would have to add extra code:
void out_characters(unsigned char *p, int n)
{
if (n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
}
which an optimizer may or may not be able to get rid of. Failing to include the extra code may cause some compilers to figure that since p "can't possibly be null", any subsequent null pointer checks may be omitted, thus causing the code to break in a spot unrelated to the actual "problem".

Is there any physical part of memory with the address of NULL(0)?

I know there's an old saying when you want to indicate this specific pointer doesn't point to anything it should be set to NULL(actually 0), but I'm wondering isn't there actually a physical part of memory with the address of NULL(0) ?
There is always a physical address of 0 (but it may not necessarily map onto physical RAM), but on a typical platform any accesses will typically be performed in a virtual address space (as jweyrich points out below, you can use mmap and so on to directly map the physical address space), so any attempt to read/write to address 0 will raise an exception of some kind.
On simpler processors (think microcontrollers and so on), there may be no such protection, so if you attempt to write to address 0, there'll be nothing to catch you.
Note also that a null pointer doesn't necessarily have to point at address 0; the only guarantee is that it will compare equal to integer value 0.
Yes, in many systems (especially embedded) there is a memory address 0, which is legal to read and write from.
On such systems, it may be optional to set up a trap that catches such read/writes.
Yes, computers can have a physical address 0. For example, in the old DOS days, you'd regularly poke around there - that's where the interrupt table started - so if you wanted to know what would run on a keypress or timer interrupt then you could create a pointer to an array of pointers, and point that at 0. I reviewed the wording in the C++ Standard a couple years ago to see if this is necessarily undefined behaviour on a system where address 0 should be accessible (at a CPU/architecture level), and my recollection is that it wasn't explicit in saying this would cause undefined behaviour. Still, it basically reserves the right to load a non-0 value when you put 0 into a pointer, compare a pointer to 0 etc: 0 is a special sentinel value that it can do whatever it likes with, so if you cared about going "by the book" then you'd have to pussy-foot around.
Unless you write a system kernel, from your point of view there is no such memory location. Your addresses are in virtual address-space, that mean they are not physical. They are translated to physical by the CPU looking up system tables.
In kernel space yes NULL can be a valid address. In user space no. As for physical address, yes there always is address zero, but programs work with logical addresses.
Note that even though the value 0 compares equal to a C/C++ NULL pointer, it is not guaranteed in the standard that a null pointer actually references address zero in the (virtual) address space of the process. (It usually does, but you know, there are bound to be some microcontrollers out there etc.) So *(reinterpret_cast<int *>(&my_pointer)) may not == 0.
On some versions of Unix (but not on Linux), every process has a read-only page containing only zero bytes mapped into its address space at address zero. On those machines, a null pointer always points to a zero value. There is software out there that makes use of this feature, and crashes when ported to Linux or Windows.
Since this is tagged C++, it should be noted that the Standard guarantees that trying to access the "null pointer" via dereference evokes undefined behavior:
1.9 Program execution [intro.execution]
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer).
[Note: this International Standard
imposes no requirements on the
behavior of programs that contain
undefined behavior. ]
...that the effect of a failed dynamic_cast is the null pointer, that deleteing the null pointer has no effect, and finally that the "null pointer constant" is == the integer expression 0:
4.10 Pointer conversions [conv.ptr]
A null pointer constant is an integral constant expression (5.19)
rvalue of integer type that evaluates to zero.

Could I ever want to access the address zero?

The constant 0 is used as the null pointer in C and C++. But as in the question "Pointer to a specific fixed address" there seems to be some possible use of assigning fixed addresses. Is there ever any conceivable need, in any system, for whatever low level task, for accessing the address 0?
If there is, how is that solved with 0 being the null pointer and all?
If not, what makes it certain that there is not such a need?
Neither in C nor in C++ null-pointer value is in any way tied to physical address 0. The fact that you use constant 0 in the source code to set a pointer to null-pointer value is nothing more than just a piece of syntactic sugar. The compiler is required to translate it into the actual physical address used as null-pointer value on the specific platform.
In other words, 0 in the source code has no physical importance whatsoever. It could have been 42 or 13, for example. I.e. the language authors, if they so pleased, could have made it so that you'd have to do p = 42 in order to set the pointer p to null-pointer value. Again, this does not mean that the physical address 42 would have to be reserved for null pointers. The compiler would be required to translate source code p = 42 into machine code that would stuff the actual physical null-pointer value (0x0000 or 0xBAAD) into the pointer p. That's exactly how it is now with constant 0.
Also note, that neither C nor C++ provides a strictly defined feature that would allow you to assign a specific physical address to a pointer. So your question about "how one would assign 0 address to a pointer" formally has no answer. You simply can't assign a specific address to a pointer in C/C++. However, in the realm of implementation-defined features, the explicit integer-to-pointer conversion is intended to have that effect. So, you'd do it as follows
uintptr_t address = 0;
void *p = (void *) address;
Note, that this is not the same as doing
void *p = 0;
The latter always produces the null-pointer value, while the former in general case does not. The former will normally produce a pointer to physical address 0, which might or might not be the null-pointer value on the given platform.
On a tangential note: you might be interested to know that with Microsoft's C++ compiler, a NULL pointer to member will be represented as the bit pattern 0xFFFFFFFF on a 32-bit machine. That is:
struct foo
{
int field;
};
int foo::*pmember = 0; // 'null' member pointer
pmember will have the bit pattern 'all ones'. This is because you need this value to distinguish it from
int foo::*pmember = &foo::field;
where the bit pattern will indeed by 'all zeroes' -- since we want offset 0 into the structure foo.
Other C++ compilers may choose a different bit pattern for a null pointer to member, but the key observation is that it won't be the all-zeroes bit pattern you might have been expecting.
You're starting from a mistaken premise. When you assign an integer constant with the value 0 to a pointer, that becomes a null pointer constant. This does not, however, mean that a null pointer necessarily refers to address 0. Quite the contrary, the C and C++ standards are both very clear that a null pointer may refer to some address other than zero.
What it comes down to is this: you do have to set aside an address that a null pointer would refer to -- but it can be essentially any address you choose. When you convert zero to a pointer, it has to refer to that chosen address -- but that's all that's really required. Just for example, if you decided that converting an integer to a point would mean adding 0x8000 to the integer, then the null pointer to would actually refer to address 0x8000 instead of address 0.
It's also worth noting that dereferencing a null pointer results in undefined behavior. That means you can't do it in portable code, but it does not mean you can't do it at all. When you're writing code for small microcontrollers and such, it's fairly common to include some bits and pieces of code that aren't portable at all. Reading from one address may give you the value from some sensor, while writing to the same address could activate a stepper motor (just for example). The next device (even using exactly the same processor) might be connected up so both of those addresses referred to normal RAM instead.
Even if a null pointer does refer to address 0, that doesn't prevent you from using it to read and/or write whatever happens to be at that address -- it just prevents you from doing so portably -- but that doesn't really matter a whole lot. The only reason address zero would normally be important would be if it was decoded to connect to something other than normal storage, so you probably can't use it entirely portably anyway.
The compiler takes care of this for you (comp.lang.c FAQ):
If a machine uses a nonzero bit pattern for null pointers, it is the compiler's responsibility to generate it when the programmer requests, by writing "0" or "NULL," a null pointer. Therefore, #defining NULL as 0 on a machine for which internal null pointers are nonzero is as valid as on any other, because the compiler must (and can) still generate the machine's correct null pointers in response to unadorned 0's seen in pointer contexts.
You can get to address zero by referencing zero from a non-pointer context.
In practice, C compilers will happily let your program attempt to write to address 0. Checking every pointer operation at run time for a NULL pointer would be a tad expensive. On computers, the program will crash because the operating system forbids it. On embedded systems without memory protection, the program will indeed write to address 0 which will often crash the whole system.
The address 0 might be useful on an embedded systems (a general term for a CPU that's not in a computer; they run everything from your stereo to your digital camera). Usually, the systems are designed so that you wouldn't need to write to address 0. In every case I know of, it's some kind of special address. Even if the programmer needs to write to it (e.g., to set up an interrupt table), they would only need to write to it during the initial boot sequence (usually a short bit of assembly language to set up the environment for C).
Memory address 0 is also called the Zero Page. This is populated by the BIOS, and contains information about the hardware running on your system. All modern kernels protect this region of memory. You should never need to access this memory, but if you want to you need to do it from within kernel land, a kernel module will do the trick.
On the x86, address 0 (or rather, 0000:0000) and its vicinity in real mode is the location of the interrupt vector. In the bad old days, you would typically write values to the interrupt vector to install interrupt handers (or if you were more disciplined, used the MS-DOS service 0x25). C compilers for MS-DOS defined a far pointer type which when assigned NULL or 0 would recieve the bit pattern 0000 in its segment part and 0000 in its offset part.
Of course, a misbehaving program that accidentally wrote to a far pointer whose value was 0000:0000 would cause very bad things to happen on the machine, typically locking it up and forcing a reboot.
In the question from the link, people are discussing setting to fixed addresses in a microcontroller. When you program a microcontroller everything is at a much lower level there.
You even don't have an OS in terms of desktop/server PC, and you don't have virtual memory and that stuff. So there is it OK and even necessary to access memory at a specific address. On a modern desktop/server PC it is useless and even dangerous.
I compiled some code using gcc for the Motorola HC11, which has no MMU and 0 is a perfectly good address, and was disappointed to find out that to write to address 0, you just write to it. There's no difference between NULL and address 0.
And I can see why. I mean, it's not really possible to define a unique NULL on an architecture where every memory location is potentially valid, so I guess the gcc authors just said 0 was good enough for NULL whether it's a valid address or not.
char *null = 0;
; Clears 8-bit AR and BR and stores it as a 16-bit pointer on the stack.
; The stack pointer, ironically, is stored at address 0.
1b: 4f clra
1c: 5f clrb
1d: de 00 ldx *0 <main>
1f: ed 05 std 5,x
When I compare it with another pointer, the compiler generates a regular comparison. Meaning that it in no way considers char *null = 0 to be a special NULL pointer, and in fact a pointer to address 0 and a "NULL" pointer will be equal.
; addr is a pointer stored at 7,x (offset of 7 from the address in XR) and
; the "NULL" pointer is at 5,y (offset of 5 from the address in YR). It doesn't
; treat the so-called NULL pointer as a special pointer, which is not standards
; compliant as far as I know.
37: de 00 ldx *0 <main>
39: ec 07 ldd 7,x
3b: 18 de 00 ldy *0 <main>
3e: cd a3 05 cpd 5,y
41: 26 10 bne 53 <.LM7>
So to address the original question, I guess my answer is to check your compiler implementation and find out whether they even bothered to implement a unique-value NULL. If not, you don't have to worry about it. ;)
(Of course this answer is not standard compliant.)
It all depends on whether the machine has virtual memory. Systems with it will typically put an unwritable page there, which is probably the behaviour that you are used to. However in systems without it (typically microcontrollers these days, but they used to be far more common) then there's often very interesting things in that area such as an interrupt table. I remember hacking around with those things back in the days of 8-bit systems; fun, and not too big a pain when you had to hard-reset the system and start over. :-)
Yes, you might want to access memory address 0x0h. Why you would want to do this is platform-dependent. A processor might use this for a reset vector, such that writing to it causes the CPU to reset. It could also be used for an interrupt vector, as a memory-mapped interface to some hardware resource (program counter, system clock, etc), or it could even be valid as a plain old memory address. There is nothing necessarily magical about memory address zero, it is just one that was historically used for special purposes (reset vectors and the like). C-like languages follow this tradition by using zero as the address for a NULL pointer, but in reality the underlying hardware may or may not see address zero as special.
The need to access address zero usually arises only in low-level details like bootloaders or drivers. In these cases, the compiler can provide options/pragmas to compile a section of code without optimizations (to prevent the zero pointer from being extracted away as a NULL pointer) or inline assembly can be used to access the true address zero.
C/C++ don't allows you to write to any address. It is the OS that can raise a signal when a user access some forbidden address. C and C++ ensure you that any memory obtained from the heap, will be different of 0.
I have at times used loads from address zero (on a known platform where that would be guaranteed to segfault) to deliberately crash at an informatively named symbol in library code if the user violates some necessary condition and there isn't any good way to throw an exception available to me. "Segfault at someFunction$xWasnt16ByteAligned" is a pretty effective error message to alert someone to what they did wrong and how to fix it. That said, I wouldn't recommend making a habit of that sort of thing.
Writing to address zero can be done, but it depends upon several factors such as your OS, target architecture and MMU configuration. In fact, it can be a useful debugging tool (but not always).
For example, a few years ago while working on an embedded system (with few debugging tools available), we had a problem which was resulting in a warm reboot. To help locate the problem, we were debugging using sprintf(NULL, ...); and a 9600 baud serial cable. As I said--few debugging tools available. With our setup, we knew that a warm reboot would not corrupt the first 256 bytes of memory. Thus after the warm reboot we could pause the loader and dump the memory contents to find out what happened prior to reboot.
Remember that in all normal cases, you don't actually see specific addresses.
When you allocate memory, the OS supplies you with the address of that chunk of memory.
When you take the reference of a variable, the the variable has already been allocated at an address determined by the system.
So accessing address zero is not really a problem, because when you follow a pointer, you don't care what address it points to, only that it is valid:
int* i = new int(); // suppose this returns a pointer to address zero
*i = 42; // now we're accessing address zero, writing the value 42 to it
So if you need to access address zero, it'll generally work just fine.
The 0 == null thing only really becomes an issue if for some reason you're accessing physical memory directly. Perhaps you're writing an OS kernel or something like that yourself. In that case, you're going to be writing to specific memory addresses (especially those mapped to hardware registers), and so you might conceivably need to write to address zero. But then you're really bypassing C++ and relying on the specifics of your compiler and hardware platform.
Of course, if you need to write to address zero, that is possible. Only the constant 0 represents a null pointer. The non-constant integer value zero will not, if assigned to a pointer, yield a null pointer.
So you could simply do something like this:
int i = 0;
int* zeroaddr = (int*)i;
now zeroaddr will point to address zero(*), but it will not, strictly speaking, be a null pointer, because the zero value was not constant.
(*): that's not entirely true. The C++ standard only guarantees an "implementation-defined mapping" between integers and addresses. It could convert the 0 to address 0x1633de20` or any other address it likes. But the mapping is usually the intuitive and obvious one, where the integer 0 is mapped to the address zero)
It may surprise many people, but in the core C language there is no such thing as a special null pointer. You are totally free to read and write to address 0 if it's physically possible.
The code below does not even compile, as NULL is not defined:
int main(int argc, char *argv[])
{
void *p = NULL;
return 0;
}
OTOH, the code below compiles, and you can read and write address 0, if the hardware/OS allows:
int main(int argc, char *argv[])
{
int *p = 0;
*p = 42;
int x = *p; /* let's assume C99 */
}
Please note, I did not include anything in the above examples.
If we start including stuff from the standard C library, NULL becomes magically defined. As far as I remember it comes from string.h.
NULL is still not a core C feature, it's a CONVENTION of many C library functions to indicate the invalidity of pointers. The C library on the given platform will define NULL to a memory location which is not accessible anyway. Let's try it on a Linux PC:
#include <stdio.h>
int main(int argc, char *argv[])
{
int *p = NULL;
printf("NULL is address %p\n", p);
printf("Contents of address NULL is %d\n", *p);
return 0;
}
The result is:
NULL is address 0x0
Segmentation fault (core dumped)
So our C library defines NULL to address zero, which it turns out is inaccessible.
But it was not the C compiler, of not even the C-library function printf() that handled the zero address specially. They all happily tried to work with it normally. It was the OS that detected a segmentation fault, when printf tried to read from address zero.
If I remember correctly, in an AVR microcontroller the register file is mapped into an address space of RAM and register R0 is at the address 0x00. It was clearly done in purpose and apparently Atmel thinks there are situations, when it's convenient to access address 0x00 instead of writing R0 explicitly.
In the program memory, at the address 0x0000 there is a reset interrupt vector and again this address is clearly intended to be accessed when programming the chip.