Is the behavior of subtracting two NULL pointers defined? - c++

Is the difference of two non-void pointer variables defined (per C99 and/or C++98) if they are both NULL valued?
For instance, say I have a buffer structure that looks like this:
struct buf {
char *buf;
char *pwrite;
char *pread;
} ex;
Say, ex.buf points to an array or some malloc'ed memory. If my code always ensures that pwrite and pread point within that array or one past it, then I am fairly confident that ex.pwrite - ex.pread will always be defined. However, what if pwrite and pread are both NULL. Can I just expect subtracting the two is defined as (ptrdiff_t)0 or does strictly compliant code need to test the pointers for NULL? Note that the only case I am interested in is when both pointers are NULL (which represents a buffer not initialized case). The reason has to do with a fully compliant "available" function given the preceding assumptions are met:
size_t buf_avail(const struct s_buf *b)
{
return b->pwrite - b->pread;
}

In C99, it's technically undefined behavior. C99 §6.5.6 says:
7) For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
[...]
9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [...]
And §6.3.2.3/3 says:
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
So since a null pointer is unequal to any object, it violates the preconditions of 6.5.6/9, so it's undefined behavior. But in practicality, I'd be willing to bet that pretty much every compiler will return a result of 0 without any ill side effects.
In C89, it's also undefined behavior, though the wording of the standard is slightly different.
C++03, on the other hand, does have defined behavior in this instance. The standard makes a special exception for subtracting two null pointers. C++03 §5.7/7 says:
If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type ptrdiff_t.
C++11 (as well as the latest draft of C++14, n3690) have identical wording to C++03, with just the minor change of std::ptrdiff_t in place of ptrdiff_t.

I found this in the C++ standard (5.7 [expr.add] / 7):
If two pointers [...] both are null, and the two pointers are
subtracted, the result compares equal to the value 0 converted to the
type std::ptrdiff_t
As others have said, C99 requires addition/subtraction between 2 pointers be of the same array object. NULL does not point to a valid object which is why you cannot use it in subtraction.

Edit: This answer is only valid for C, I didn't see the C++ tag when I answered.
No, pointer arithmetic is only allowed for pointers that point within the same object. Since by definition of the C standard null pointers don't point to any object, this is undefined behavior.
(Although, I'd guess that any reasonable compiler will return just 0 on it, but who knows.)

The C Standard does not impose any requirements on the behavior in this case, but many implementations do specify the behavior of pointer arithmetic in many cases beyond the bare minimums required by the Standard, including this one.
On any conforming C implementation, and nearly all (if not all) implementations of C-like dialects, the following guarantees will hold for any pointer p such that either *p or *(p-1) identifies some object:
For any integer value z that equals zero, The pointer values (p+z) and (p-z) will be equivalent in every way to p, except that they will only be constant if both p and z are constant.
For any q which is equivalent to p, the expressions p-q and q-p will both yield zero.
Having such guarantees hold for all pointer values, including null, may eliminate the need for some null checks in user code. Further, on most platforms, generating code that upholds such guarantees for all pointer values without regard for whether they are null would be simpler and cheaper than treating nulls specially. Some platforms, however, may trap on attempts to perform pointer arithmetic with null pointers, even when adding or subtracting zero. On such platforms, the number of compiler-generated null checks that would have to be added to pointer operations to uphold the guarantee would in many cases vastly exceed the number of user-generated null checks that could be omitted as a result.
If there were an implementation where the cost of upholding the guarantees would be great, but few if any programs would receive any benefit from them, it would make sense to allow it to trap "null+zero" computations, and require that user code for such an implementation include the manual null checks that the guarantees could have made unnecessary. Such an allowance was not expected to affect the other 99.44% of implementations, where the value of upholding the guarantees would exceed the cost. Such implementations should uphold such guarantees, but their authors shouldn't need the authors of the Standard to tell them that.
The authors of C++ have decided that conforming implementations must uphold the above guarantees at any cost, even on platforms where they could substantially degrade the performance of pointer arithmetic. They judged that the value of the guarantees even on platforms where they would be expensive to uphold would exceed the cost. Such an attitude may have been affected by a desire to treat C++ as a higher-level language than C. A C programmer could be expected to know when a particular target platform would handle cases like (null+zero) in unusual fashion, but C++ programmers weren't expected to concern themselves with such things. Guaranteeing a consistent behavioral model was thus judged to be worth the cost.
Of course, nowadays questions about what is "defined" seldom have anything to do with what behaviors a platform can support. Instead, it is now fashionable for compilers to--in the name of "optimization"--require that programmers manually write code to handle corner cases which platforms would previously have handled correctly. For example, if code which is supposed to output n characters starting at address p is written as:
void out_characters(unsigned char *p, int n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
older compilers would generate code that would reliably output nothing, with
no side-effect, if p==NULL and n==0, with no need to special-case n==0. On
newer compilers, however, one would have to add extra code:
void out_characters(unsigned char *p, int n)
{
if (n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
}
which an optimizer may or may not be able to get rid of. Failing to include the extra code may cause some compilers to figure that since p "can't possibly be null", any subsequent null pointer checks may be omitted, thus causing the code to break in a spot unrelated to the actual "problem".

Related

Full emulation of `intptr_t` with `ptrdiff_t` and `nullptr`?

Given that intptr_t is optional and ptrdiff_t is mandatory, would p - nullptr be a good substitute for (intptr_t)p, to be converted back from a result denoted by n with nullptr + n instead of (decltype(p))n? It's IIUC semantically equivalent on implementations with intptr_t defined, but also works as intended otherwise.
If I'm right in the above, why does the standard allow not implementing intptr_t? It seems the liberty thus afforded isn't particularly valuable, just a set of two simple local source code transforms (or an optimized equivalent) to shave off.
No. ptrdiff_t only needs to be large enough to encompass a single object, not the entire memory space. And (char*)p - (char*)nullptr causes undefined behavior if p is not itself a null pointer.
p - nullptr without the casts is ill-formed.

For a pointer p, could p < p+1 be false in an extreme case?

Is it possible, for a pointer variable p, that p<(p+1) is false? Please explain your answer. If yes, under which circumstances can this happen?
I was wondering whether p+1 could overflow and be equal to 0.
E.g. On a 64-bit PC with GCC-4.8 for a C-language program:
int main(void) {
void *p=(void *)0xFFFFFFFFFFFFFFFF;
printf("p :%p\n", p);
printf("p+1 :%p\n", p+1);
printf("Result :%d\n", p<p+1);
}
It returns:
p : 0xffffffffffffffff
p+1 : (nil)
Result : 0
So I believe it is possible for this case. For an invalid pointer location it can happen.
This is the only solution I can think of. Are there others?
Note:
No assumptions are made. Consider any compiler/platform/architecture/OS where there is a chance that this can happen or not.
Is it possible, for a pointer variable p, that p<(p+1) is false?
If p points to a valid object (that is, one created according to the C++ object model) of the correct type, then no. p+1 will point to the memory location after that object, and will always compare greater than p.
Otherwise, the behaviour of both the arithmetic and the comparison are undefined, so the result could be true, false, or a suffusion of yellow.
If yes, under which circumstances can this happen?
It might, or might not, happen with
p = reinterpret_cast<char*>(numeric_limits<uintptr_t>::max);
If pointer arithmetic works like unsigned integer arithmetic, then this might cause a numeric overflow such that p+1 has the value zero, and compares less than p. Or it might do something else.
What if I'm programming on DOS, and I have a far pointer (one composed of a segment and an offset), and it's pointing to the last address in the segment, and I add one to it, and the pointer wraps around? It looks like when you're comparing them, you normalize the pointers, so the second pointer p+1 would be less than p.
This is a stab in the dark though, I don't have a DOS C compiler handy to test on.
Very simple: It cannot happen if there is no undefined behaviour involved. It can happen very easily in the presence of undefined behaviour. For details, read a copy of the C Standard or C++ Standard.
As a result, a conforming compiler is allowed to not evaluate the < operator at all and use 1 or true as the result instead. The same is true for arithmetic with signed integers (but not for unsigned integers, where it is possible for entirely legal code to have x > x+1).
Your example code isn't even C or C++, so you seem to have used the compiler in a mode where it isn't a standard conforming C or C++ compiler.
It could happen with an invalid pointer.
But if the pointer points to a valid memory location, on many operating systems (e.g. Linux), it practically never happens (at least if the sizeof(*p) is not too big), because in practice the first and last pages of the address space are never mapped (but you could force a mapping with mmap & MAP_FIXED).
For freestanding implementations (i.e. inside a kernel, or on some microcontroller), things are different, and implementation specific (perhaps might be undefined behavior, or unspecified behavior).
According to Pointer comparisons in C. Are they signed or unsigned? on Stack Overflow:
You can't legally compare arbitrary pointers in C/C++. The result of such comparison is not defined.

Is performing arithmetic on a null pointer undefined behavior?

It looks to me like the following program computes an invalid pointer, since NULL is no good for anything but assignment and comparison for equality:
#include <stdlib.h>
#include <stdio.h>
int main() {
char *c = NULL;
c--;
printf("c: %p\n", c);
return 0;
}
However, it seems like none of the warnings or instrumentations in GCC or Clang targeted at undefined behavior say that this is in fact UB. Is that arithmetic actually valid and I'm being too pedantic, or is this a deficiency in their checking mechanisms that I should report?
Tested:
$ clang-3.3 -Weverything -g -O0 -fsanitize=undefined -fsanitize=null -fsanitize=address offsetnull.c -o offsetnull
$ ./offsetnull
c: 0xffffffffffffffff
$ gcc-4.8 -g -O0 -fsanitize=address offsetnull.c -o offsetnull
$ ./offsetnull
c: 0xffffffffffffffff
It seems to be pretty well documented that AddressSanitizer as used by Clang and GCC is more focused on dereference of bad pointers, so that's fair enough. But the other checks don't catch it either :-/
Edit: part of the reason that I asked this question is that the -fsanitize flags enable dynamic checks of well-definedness in the generated code. Is this something they should have caught?
Pointer arithmetic on a pointer not pointing to an array is Undefined behavior.
Also, Dereferencing a NULL pointer is undefined behavior.
char *c = NULL;
c--;
is Undefined defined behavior because c does not point to an array.
C++11 Standard 5.7.5:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Yes, this is undefined behavior, and is something that -fsanitize=undefined should have caught; it's already on my TODO list to add a check for this.
FWIW, the C and C++ rules here are slightly different: adding 0 to a null pointer and subtracting one null pointer from another have undefined behavior in C but not in C++. All other arithmetic on null pointers has undefined behavior in both languages.
Not only is arithmetic on a null pointer forbidden, but the failure of implementations which trap attempted dereferences to also trap arithmetic on null pointers greatly degrades the benefit of null-pointer traps.
There is never any situation defined by the Standard where adding anything to a null pointer can yield a legitimate pointer value; further, situations in which implementations could define any useful behavior for such actions are rare and could generally better be handled via compiler intrinsics(*). On many implementations, however, if null-pointer arithmetic isn't trapped, adding an offset to a null pointer can yield a pointer which, while not valid, is no longer recognizable as a null pointer. An attempt to dereference such a pointer would not be trapped, but could trigger arbitrary effects.
Trapping pointer computations of the form (null+offset) and (null-offset) would eliminate this danger. Note that protection would not necessarily require trapping (pointer-null), (null-pointer), or (null-null), while the values returned by the first two expressions would be unlikely to have any usefulness [if an implementation were to specify that null-null would yield zero, code which targeted that particular implementation might sometimes be more efficient than code which had to special-case null] they would not generate invalid pointers. Further, having (null+0) and (null-0) either yield null pointers rather than trapping would not jeopardize safety and may avoid the need to have user code special-case null pointers, but the advantages would be less compelling since the compiler would have to add extra code to make that happen.
(*) Such an intrinsic on an 8086 compilers, for example, might accept an unsigned 16-bit integers "seg" and "ofs", and read the word at address seg:ofs without a null trap even when address happened to be zero. Address (0x0000:0x0000) on the 8086 is an interrupt vector which some programs may need to access, and while address (0xFFFF:0x0010) accesses the same physical location as (0x0000:0x0000) on older processors with only 20 address lines, it accesses physical location 0x100000 on processors with 24 or more address lines). In some cases an alternative would be to have a special designation for pointers which are expected to point to things not recognized by the C standard (things like the interrupt vectors would qualify) and refrain from null-trapping those, or else to specify that volatile pointers will be treated in such fashion. I've seen the first behavior in at least one compiler, but don't think I've seen the second.

Can I convert a null pointer of int to a long type by reinterpret_cast

int *pt = 0;
long i = reinterpret_cast<long>(pt);
Is i guaranteed to be 0? Is this well defined or implementation-defined?
I checked the c++ standard, but it only states that
A pointer to a data object or to a function (but not a pointer to member) can be converted to any integer type large enough to contain it.
In this case, pt doesn't point to any data object. Does the rule apply to this case?
No, i is not necessarily any value. The result is implementation-defined.†
The representation of pointers, in C++, is implementation-defined, including the representation of a null pointer. When you assign an integer value of zero to a pointer, you set that pointer to the implementation-defined null pointer value, which is not necessarily all-bits-zero. The result of casting that value to an integer is, by transitivity, implementation-defined.
Even more troublesome, though, is that the mapping done by reinterpret_cast is implementation-defined anyway. So even if the null pointer value was all-bits-zero, an implementation is free to make the result whatever it wants. You're only guaranteed that you'll get the original value when you cast back.
That all said, the next sentence after your quote includes the note:
[ Note: It is intended to be unsurprising to those who know the addressing structure
of the underlying machine. —end note ]
So even though specific mappings are not required, pragmatically you can take an educated guess.
† Assuming long is large enough. In C++0x use uintptr_t, optionally defined in <cstddef>.
I don't see why it wouldn't. The issue is that the integer type in question wouldn't be able to hold values that high, but if it's a null pointer there's no problem.
You want reinterpret_cast(pt) but yes, that will work. reinterpret_cast is the same as the old C style cast (long) and assumes you know what you are doing.
Yes, it does not matter where it points at runtime; the only thing that matters is the type of the pointer. "to a data object or to a function" just means that both "regular" object pointers as well as function pointers (i.e. pointers to object code) can be recast into numeric types.
I think the code you have above safely stores 0 to 'i' on all known common platforms.

Why is address zero used for the null pointer?

In C (or C++ for that matter), pointers are special if they have the value zero: I am adviced to set pointers to zero after freeing their memory, because it means freeing the pointer again isn't dangerous; when I call malloc it returns a pointer with the value zero if it can't get me memory; I use if (p != 0) all the time to make sure passed pointers are valid, etc.
But since memory addressing starts at 0, isn't 0 just as a valid address as any other? How can 0 be used for handling null pointers if that is the case? Why isn't a negative number null instead?
Edit:
A bunch of good answers. I'll summarize what has been said in the answers expressed as my own mind interprets it and hope that the community will correct me if I misunderstand.
Like everything else in programming it's an abstraction. Just a constant, not really related to the address 0. C++0x emphasizes this by adding the keyword nullptr.
It's not even an address abstraction, it's the constant specified by the C standard and the compiler can translate it to some other number as long as it makes sure it never equals a "real" address, and equals other null pointers if 0 is not the best value to use for the platform.
In case it's not an abstraction, which was the case in the early days, the address 0 is used by the system and off limits to the programmer.
My negative number suggestion was a little wild brainstorming, I admit. Using a signed integer for addresses is a little wasteful if it means that apart from the null pointer (-1 or whatever) the value space is split evenly between positive integers that make valid addresses and negative numbers that are just wasted.
If any number is always representable by a datatype, it's 0. (Probably 1 is too. I think of the one-bit integer which would be 0 or 1 if unsigned, or just the signed bit if signed, or the two bit integer which would be [-2, 1]. But then you could just go for 0 being null and 1 being the only accessible byte in memory.)
Still there is something that is unresolved in my mind. The Stack Overflow question Pointer to a specific fixed address tells me that even if 0 for null pointer is an abstraction, other pointer values aren't necessarily. This leads me to post another Stack Overflow question, Could I ever want to access the address zero?.
2 points:
only the constant value 0 in the source code is the null pointer - the compiler implementation can use whatever value it wants or needs in the running code. Some platforms have a special pointer value that's 'invalid' that the implementation might use as the null pointer. The C FAQ has a question, "Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types?", that points out several platforms that used this property of 0 being the null pointer in C source while represented differently at runtime. The C++ standard has a note that makes clear that converting "an integral constant expression with value zero always yields a null pointer, but converting other expressions that happen to have value zero need not yield a null pointer".
a negative value might be just as usable by the platform as an address - the C standard simply had to chose something to use to indicate a null pointer, and zero was chosen. I'm honestly not sure if other sentinel values were considered.
The only requirements for a null pointer are:
it's guaranteed to compare unequal to a pointer to an actual object
any two null pointers will compare equal (C++ refines this such that this only needs to hold for pointers to the same type)
Historically, the address space starting at 0 was always ROM, used for some operating system or low level interrupt handling routines, nowadays, since everything is virtual (including address space), the operating system can map any allocation to any address, so it can specifically NOT allocate anything at address 0.
IIRC, the "null pointer" value isn't guaranteed to be zero. The compiler translates 0 into whatever "null" value is appropriate for the system (which in practice is probably always zero, but not necessarily). The same translation is applied whenever you compare a pointer against zero. Because you can only compare pointers against each other and against this special-value-0, it insulates the programmer from knowing anything about the memory representation of the system. As for why they chose 0 instead of 42 or somesuch, I'm going to guess it's because most programmers start counting at 0 :) (Also, on most systems 0 is the first memory address and they wanted it to be convenient, since in practice translations like I'm describing rarely actually take place; the language just allows for them).
You must be misunderstanding the meaning of constant zero in pointer context.
Neither in C nor in C++ pointers can "have value zero". Pointers are not arithmetic objects. They canot have numerical values like "zero" or "negative" or anything of that nature. So your statement about "pointers ... have the value zero" simply makes no sense.
In C & C++ pointers can have the reserved null-pointer value. The actual representation of null-pointer value has nothing to do with any "zeros". It can be absolutely anything appropriate for a given platform. It is true that on most plaforms null-pointer value is represented physically by an actual zero address value. However, if on some platform address 0 is actually used for some purpose (i.e. you might need to create objects at address 0), the null-pointer value on such platform will most likely be different. It could be physically represented as 0xFFFFFFFF address value or as 0xBAADBAAD address value, for example.
Nevertheless, regardless of how the null-pointer value is respresented on a given platform, in your code you will still continue to designate null-pointers by constant 0. In order to assign a null-pointer value to a given pointer, you will continue to use expressions like p = 0. It is the compiler's responsibility to realize what you want and translate it into the proper null-pointer value representation, i.e. to translate it into the code that will put the address value of 0xFFFFFFFF into the pointer p, for example.
In short, the fact that you use 0 in your sorce code to generate null-pointer values does not mean that the null-pointer value is somehow tied to address 0. The 0 that you use in your source code is just "syntactic sugar" that has absolutely no relation to the actual physical address the null-pointer value is "pointing" to.
But since memory addressing starts at 0, isn't 0 just as a valid address as any other?
On some/many/all operating systems, memory address 0 is special in some way. For example, it's often mapped to invalid/non-existent memory, which causes an exception if you try to access it.
Why isn't a negative number null instead?
I think that pointer values are typically treated as unsigned numbers: otherwise for example a 32-bit pointer would only be able to address 2 GB of memory, instead of 4 GB.
My guess would be that the magic value 0 was picked to define an invalid pointer since it could be tested for with less instructions. Some machine languages automatically set the zero and sign flags according to the data when loading registers so you could test for a null pointer with a simple load then and branch instructions without doing a separate compare instruction.
(Most ISAs only set flags on ALU instructions, not loads, though. And usually you aren't producing pointers via computation, except in the compiler when parsing C source. But at least you don't need an arbitrary pointer-width constant to compare against.)
On the Commodore Pet, Vic20, and C64 which were the first machines I worked on, RAM started at location 0 so it was totally valid to read and write using a null pointer if you really wanted to.
I think it's just a convention. There must be some value to mark an invalid pointer.
You just lose one byte of address space, that should rarely be a problem.
There are no negative pointers. Pointers are always unsigned. Also if they could be negative your convention would mean that you lose half the address space.
Although C uses 0 to represent the null pointer, do keep in mind that the value of the pointer itself may not be a zero. However, most programmers will only ever use systems where the null pointer is, in fact, 0.
But why zero? Well, it's one address that every system shares. And oftentimes the low addresses are reserved for operating system purposes thus the value works well as being off-limits to application programs. Accidental assignment of an integer value to a pointer is as likely to end up zero as anything else.
Historically the low memory of an application was occupied by system resources. It was in those days that zero became the default null value.
While this is not necessarily true for modern systems, it is still a bad idea to set pointer values to anything but what memory allocation has handed you.
Regarding the argument about not setting a pointer to null after deleting it so that future deletes "expose errors"...
If you're really, really worried about this then a better approach, one that is guaranteed to work, is to leverage assert():
...
assert(ptr && "You're deleting this pointer twice, look for a bug?");
delete ptr;
ptr = 0;
...
This requires some extra typing, and one extra check during debug builds, but it is certain to give you what you want: notice when ptr is deleted 'twice'. The alternative given in the comment discussion, not setting the pointer to null so you'll get a crash, is simply not guaranteed to be successful. Worse, unlike the above, it can cause a crash (or much worse!) on a user if one of these "bugs" gets through to the shelf. Finally, this version lets you continue to run the program to see what actually happens.
I realize this does not answer the question asked, but I was worried that someone reading the comments might come to the conclusion that it is considered 'good practice' to NOT set pointers to 0 if it is possible they get sent to free() or delete twice. In those few cases when it is possible it is NEVER a good practice to use Undefined Behavior as a debugging tool. Nobody that's ever had to hunt down a bug that was ultimately caused by deleting an invalid pointer would propose this. These kinds of errors take hours to hunt down and nearly alway effect the program in a totally unexpected way that is hard to impossible to track back to the original problem.
An important reason why many operating systems use all-bits-zero for the null pointer representation, is that this means memset(struct_with_pointers, 0, sizeof struct_with_pointers) and similar will set all of the pointers inside struct_with_pointers to null pointers. This is not guaranteed by the C standard, but many, many programs assume it.
In one of the old DEC machines (PDP-8, I think), the C runtime would memory protect the first page of memory so that any attempt to access memory in that block would cause an exception to be raised.
The choice of sentinel value is arbitrary, and this is in fact being addressed by the next version of C++ (informally known as "C++0x", most likely to be known in the future as ISO C++ 2011) with the introduction of the keyword nullptr to represent a null valued pointer. In C++, a value of 0 may be used as an initializing expression for any POD and for any object with a default constructor, and it has the special meaning of assigning the sentinel value in the case of a pointer initialization. As for why a negative value was not chosen, addresses usually range from 0 to 2N-1 for some value N. In other words, addresses are usually treated as unsigned values. If the maximum value were used as the sentinel value, then it would have to vary from system to system depending on the size of memory whereas 0 is always a representable address. It is also used for historical reasons, as memory address 0 was typically unusable in programs, and nowadays most OSs have parts of the kernel loaded into the lower page(s) of memory, and such pages are typically protected in such a way that if touched (dereferenced) by a program (save the kernel) will cause a fault.
It has to have some value. Obviously you don't want to step on values the user might legitimately want to use. I would speculate that since the C runtime provides the BSS segment for zero-initialized data, it makes a certain degree of sense to interpret zero as an un-initialized pointer value.
Rarely does an OS allow you to write to address 0. It's common to stick OS-specific stuff down in low memory; namely, IDTs, page tables, etc. (The tables have to be in RAM, and it's easier to stick them at the bottom than to try and determine where the top of RAM is.) And no OS in its right mind will let you edit system tables willy-nilly.
This may not have been on K&R's minds when they made C, but it (along with the fact that 0==null is pretty easy to remember) makes 0 a popular choice.
The value 0 is a special value that takes on various meanings in specific expressions. In the case of pointers, as has been pointed out many many times, it is used probably because at the time it was the most convenient way of saying "insert the default sentinel value here." As a constant expression, it does not have the same meaning as bitwise zero (i.e., all bits set to zero) in the context of a pointer expression. In C++, there are several types that do not have a bitwise zero representation of NULL such as pointer member and pointer to member function.
Thankfully, C++0x has a new keyword for "expression that means a known invalid pointer that does not also map to bitwise zero for integral expressions": nullptr. Although there are a few systems that you can target with C++ that allow dereferencing of address 0 without barfing, so programmer beware.
There are already a lot of good answers in this thread; there are probably many different reasons for preferring the value 0 for null pointers, but I'm going to add two more:
In C++, zero-initializing a pointer will set it to null.
On many processors it is more efficient to set a value to 0 or to test for it equal/not equal to 0 than for any other constant.
This is dependent on the implementation of pointers in C/C++. There is no specific reason why NULL is equivalent in assignments to a pointer.
Null pointer is not the same thing with null value. For example the same strchr function of c will return a null pointer (empty on the console), while passing the value would return (null) on the console.
True function:
char *ft_strchr(const char *s, int c)
{
int i;
if (!s)
return (NULL);
i = 0;
while (s[i])
{
if (s[i] == (char)c)
return ((char*)(s + i));
i++;
}
**if (s[i] == (char)c)
return ((char*)(s + i));**
return (NULL);
}
This will produce empty thing as the output: the last || is the output.
While passing as value like s[i] gives us a NULL like: enter image description here
char *ft_strchr(const char *s, int c)
{
int i;
if (!s)
return (NULL);
i = 0;
while (s[i])
{
if (s[i] == (char)c)
return ((char*)(s + i));
i++;
}
**if (s[i] == (char)c)
return (s[i]);**
return (NULL);
}
There are historic reasons for this, but there are also optimization reasons for it.
It is common for the OS to provide a process with memory pages initialized to 0. If a program wants to interpret part of that memory page as a pointer then it is 0, so it is easy enough for the program to determine that that pointer is not initialized. (this doesn't work so well when applied to uninitialized flash pages)
Another reason is that on many many processors it is very very easy to test a value's equivalence to 0. It is sometimes a free comparison done without any extra instructions needed, and usually can be done without needing to provide a zero value in another register or as a literal in the instruction stream to compare to.
The cheap comparisons for most processors are the signed less than 0, and equal to 0. (signed greater than 0 and not equal to 0 are implied by both of these)
Since 1 value out of all of possible values needs to be reserved as bad or uninitialized then you might as well make it the one that has the cheapest test for equivalence to the bad value. This is also true for '\0' terminated character strings.
If you were to try to use greater or less than 0 for this purpose then you would end up chopping your range of addresses in half.
The constant 0 is used instead of NULL because C was made by some cavemen trillions of years ago, NULL, NIL, ZIP, or NADDA would have all made much more sense than 0.
But since memory addressing starts at
0, isn't 0 just as a valid address as
any other?
Indeed. Although a lot of operating systems disallow you from mapping anything at address zero, even in a virtual address space (people realized C is an insecure language, and reflecting that null pointer dereference bugs are very common, decided to "fix" them by dissallowing the userspace code to map to page 0; Thus, if you call a callback but the callback pointer is NULL, you wont end up executing some arbitrary code).
How can 0 be used for handling null
pointers if that is the case?
Because 0 used in comparison to a pointer will be replaced with some implementation specific value, which is the return value of malloc on a malloc failure.
Why isn't a negative number null
instead?
This would be even more confusing.