It looks to me like the following program computes an invalid pointer, since NULL is no good for anything but assignment and comparison for equality:
#include <stdlib.h>
#include <stdio.h>
int main() {
char *c = NULL;
c--;
printf("c: %p\n", c);
return 0;
}
However, it seems like none of the warnings or instrumentations in GCC or Clang targeted at undefined behavior say that this is in fact UB. Is that arithmetic actually valid and I'm being too pedantic, or is this a deficiency in their checking mechanisms that I should report?
Tested:
$ clang-3.3 -Weverything -g -O0 -fsanitize=undefined -fsanitize=null -fsanitize=address offsetnull.c -o offsetnull
$ ./offsetnull
c: 0xffffffffffffffff
$ gcc-4.8 -g -O0 -fsanitize=address offsetnull.c -o offsetnull
$ ./offsetnull
c: 0xffffffffffffffff
It seems to be pretty well documented that AddressSanitizer as used by Clang and GCC is more focused on dereference of bad pointers, so that's fair enough. But the other checks don't catch it either :-/
Edit: part of the reason that I asked this question is that the -fsanitize flags enable dynamic checks of well-definedness in the generated code. Is this something they should have caught?
Pointer arithmetic on a pointer not pointing to an array is Undefined behavior.
Also, Dereferencing a NULL pointer is undefined behavior.
char *c = NULL;
c--;
is Undefined defined behavior because c does not point to an array.
C++11 Standard 5.7.5:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Yes, this is undefined behavior, and is something that -fsanitize=undefined should have caught; it's already on my TODO list to add a check for this.
FWIW, the C and C++ rules here are slightly different: adding 0 to a null pointer and subtracting one null pointer from another have undefined behavior in C but not in C++. All other arithmetic on null pointers has undefined behavior in both languages.
Not only is arithmetic on a null pointer forbidden, but the failure of implementations which trap attempted dereferences to also trap arithmetic on null pointers greatly degrades the benefit of null-pointer traps.
There is never any situation defined by the Standard where adding anything to a null pointer can yield a legitimate pointer value; further, situations in which implementations could define any useful behavior for such actions are rare and could generally better be handled via compiler intrinsics(*). On many implementations, however, if null-pointer arithmetic isn't trapped, adding an offset to a null pointer can yield a pointer which, while not valid, is no longer recognizable as a null pointer. An attempt to dereference such a pointer would not be trapped, but could trigger arbitrary effects.
Trapping pointer computations of the form (null+offset) and (null-offset) would eliminate this danger. Note that protection would not necessarily require trapping (pointer-null), (null-pointer), or (null-null), while the values returned by the first two expressions would be unlikely to have any usefulness [if an implementation were to specify that null-null would yield zero, code which targeted that particular implementation might sometimes be more efficient than code which had to special-case null] they would not generate invalid pointers. Further, having (null+0) and (null-0) either yield null pointers rather than trapping would not jeopardize safety and may avoid the need to have user code special-case null pointers, but the advantages would be less compelling since the compiler would have to add extra code to make that happen.
(*) Such an intrinsic on an 8086 compilers, for example, might accept an unsigned 16-bit integers "seg" and "ofs", and read the word at address seg:ofs without a null trap even when address happened to be zero. Address (0x0000:0x0000) on the 8086 is an interrupt vector which some programs may need to access, and while address (0xFFFF:0x0010) accesses the same physical location as (0x0000:0x0000) on older processors with only 20 address lines, it accesses physical location 0x100000 on processors with 24 or more address lines). In some cases an alternative would be to have a special designation for pointers which are expected to point to things not recognized by the C standard (things like the interrupt vectors would qualify) and refrain from null-trapping those, or else to specify that volatile pointers will be treated in such fashion. I've seen the first behavior in at least one compiler, but don't think I've seen the second.
Related
I am programming C++ using gcc on an obscure system called linux x86-64. I was hoping that may be there are a few folks out there who have used this same, specific system (and might also be able to help me understand what is a valid pointer on this system). I do not care to access the location pointed to by the pointer, just want to calculate it via pointer arithmetic.
According to section 3.9.2 of the standard:
A valid value of an object pointer type represents either the address of a byte in memory (1.7) or a null pointer.
And according to [expr.add]/4:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤
n; otherwise, the behavior is undefined. Likewise, the expression P -
J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j
≤ n; otherwise, the behavior is undefined.
And according to a stackoverflow question on valid C++ pointers in general:
Is 0x1 a valid memory address on your system? Well, for some embedded systems it is. For most OSes using virtual memory, the page beginning at zero is reserved as invalid.
Well, that makes it perfectly clear! So, besides NULL, a valid pointer is a byte in memory, no, wait, it's an array element including the element right after the array, no, wait, it's a virtual memory page, no, wait, it's Superman!
(I guess that by "Superman" here I mean "garbage collectors"... not that I read that anywhere, just smelled it. Seriously, though, all the best garbage collectors don't break in a serious way if you have bogus pointers lying around; at worst they just don't collect a few dead objects every now and then. Doesn't seem like anything worth messing up pointer arithmetic for.).
So, basically, a proper compiler would have to support all of the above flavors of valid pointers. I mean, a hypothetical compiler having the audacity to generate undefined behavior just because a pointer calculation is bad would be dodging at least the 3 bullets above, right? (OK, language lawyers, that one's yours).
Furthermore, many of these definitions are next to impossible for a compiler to know about. There are just so many ways of creating a valid memory byte (think lazy segfault trap microcode, sideband hints to a custom pagetable system that I'm about to access part of an array, ...), mapping a page, or simply creating an array.
Take, for example, a largish array I created myself, and a smallish array that I let the default memory manager create inside of that:
#include <iostream>
#include <inttypes.h>
#include <assert.h>
using namespace std;
extern const char largish[1000000000000000000L];
asm("largish = 0");
int main()
{
char* smallish = new char[1000000000];
cout << "largish base = " << (long)largish << "\n"
<< "largish length = " << sizeof(largish) << "\n"
<< "smallish base = " << (long)smallish << "\n";
}
Result:
largish base = 0
largish length = 1000000000000000000
smallish base = 23173885579280
(Don't ask how I knew that the default memory manager would allocate something inside of the other array. It's an obscure system setting. The point is I went through weeks of debugging torment to make this example work, just to prove to you that different allocation techniques can be oblivious to one another).
Given the number of ways of managing memory and combining program modules that are supported in linux x86-64, a C++ compiler really can't know about all of the arrays and various styles of page mappings.
Finally, why do I mention gcc specifically? Because it often seems to treat any pointer as a valid pointer... Take, for instance:
char* super_tricky_add_operation(char* a, long b) {return a + b;}
While after reading all the language specs you might expect the implementation of super_tricky_add_operation(a, b) to be rife with undefined behavior, it is in fact very boring, just an add or lea instruction. Which is so great, because I can use it for very convenient and practical things like non-zero-based arrays if nobody is putzing with my add instructions just to make a point about invalid pointers. I love gcc.
In summary, it seems that any C++ compiler supporting standard linkage tools on linux x86-64 would almost have to treat any pointer as a valid pointer, and gcc appears to be a member of that club. But I'm not quite 100% sure (given enough fractional precision, that is).
So... can anyone give a solid example of an invalid pointer in gcc linux x86-64? By solid I mean leading to undefined behavior. And explain what gives rise to the undefined behavior allowed by the language specs?
(or provide gcc documentation proving the contrary: that all pointers are valid).
Usually pointer math does exactly what you'd expect regardless of whether pointers are pointing at objects or not.
UB doesn't mean it has to fail. Only that it's allowed to make the whole rest of the program behave strangely in some way. UB doesn't mean that just the pointer-compare result can be "wrong", it means the entire behaviour of the whole program is undefined. This tends to happen with optimizations that depend on a violated assumption.
Interesting corner cases include an array at the very top of virtual address space: a pointer to one-past-the-end would wrap to zero, so start < end would be false?!? But pointer comparison doesn't have to handle that case, because the Linux kernel won't ever map the top page, so pointers into it can't be pointing into or just past objects. See Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?
Related:
GCC does have a max object size of PTRDIFF_MAX (which is a signed type). So for example, on 32-bit x86, an array larger than 2GB isn't fully supported for all cases of code-gen, although you can mmap one.
See my comment on What is the maximum size of an array in C? - this restriction lets gcc implement pointer subtraction (to get a size) without keeping the carry-out from the high bit, for types wider than char where the C subtraction result is in objects, not bytes, so in asm it's (a - b) / sizeof(T).
Don't ask how I knew that the default memory manager would allocate something inside of the other array. It's an obscure system setting. The point is I went through weeks of debugging torment to make this example work, just to prove to you that different allocation techniques can be oblivious to one another).
First of all, you never actually allocated the space for large[]. You used inline asm to make it start at address 0, but did nothing to actually get those pages mapped.
The kernel won't overlap existing mapped pages when new uses brk or mmap to get new memory from the kernel, so in fact static and dynamic allocation can't overlap.
Second, char[1000000000000000000L] ~= 2^59 bytes. Current x86-64 hardware and software only support canonical 48-bit virtual addresses (sign-extended to 64-bit). This will change with a future generation of Intel hardware which adds another level of page tables, taking us up to 48+9 = 57-bit addresses. (Still with the top half used by the kernel, and a big hole in the middle.)
Your unallocated space from 0 to ~2^59 covers all user-space virtual memory addresses that are possible on x86-64 Linux, so of course anything you allocate (including other static arrays) will be somewhere "inside" this fake array.
Removing the extern const from the declaration (so the array is actually allocated, https://godbolt.org/z/Hp2Exc) runs into the following problems:
//extern const
char largish[1000000000000000000L];
//asm("largish = 0");
/* rest of the code unchanged */
RIP-relative or 32-bit absolute (-fno-pie -no-pie) addressing can't reach static data that gets linked after large[] in the BSS, with the default code model (-mcmodel=small where all static code+data is assumed to fit in 2GB)
$ g++ -O2 large.cpp
/usr/bin/ld: /tmp/cc876exP.o: in function `_GLOBAL__sub_I_largish':
large.cpp:(.text.startup+0xd7): relocation truncated to fit: R_X86_64_PC32 against `.bss'
/usr/bin/ld: large.cpp:(.text.startup+0xf5): relocation truncated to fit: R_X86_64_PC32 against `.bss'
collect2: error: ld returned 1 exit status
compiling with -mcmodel=medium places large[] in a large-data section where it doesn't interfere with addressing other static data, but it itself is addressed using 64-bit absolute addressing. (Or -mcmodel=large does that for all static code/data, so every call is indirect movabs reg,imm64 / call reg instead of call rel32.)
That lets us compile and link, but then the executable won't run because the kernel knows that only 48-bit virtual addresses are supported and won't map the program in its ELF loader before running it, or for PIE before running ld.so on it.
peter#volta:/tmp$ g++ -fno-pie -no-pie -mcmodel=medium -O2 large.cpp
peter#volta:/tmp$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffd788a4b60 /* 52 vars */) = -1 EINVAL (Invalid argument)
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)
peter#volta:/tmp$ g++ -mcmodel=medium -O2 large.cpp
peter#volta:/tmp$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffdd3bbad00 /* 52 vars */) = -1 ENOMEM (Cannot allocate memory)
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)
(Interesting that we get different error codes for PIE vs non-PIE executables, but still before execve() even completes.)
Tricking the compiler + linker + runtime with asm("largish = 0"); is not very interesting, and creates obvious undefined behaviour.
Fun fact #2: x64 MSVC doesn't support static objects larger than 2^31-1 bytes. IDK if it has a -mcmodel=medium equivalent. Basically GCC fails to warn about objects too large for the selected memory model.
<source>(7): error C2148: total size of array must not exceed 0x7fffffff bytes
<source>(13): warning C4311: 'type cast': pointer truncation from 'char *' to 'long'
<source>(14): error C2070: 'char [-1486618624]': illegal sizeof operand
<source>(15): warning C4311: 'type cast': pointer truncation from 'char *' to 'long'
Also, it points out that long is the wrong type for pointers in general (because Windows x64 is an LLP64 ABI, where long is 32 bits). You want intptr_t or uintptr_t, or something equivalent to printf("%p") that prints a raw void*.
The Standard does not anticipate the existence of any storage beyond that which the implementation provides via objects of static, automatic, or thread duration, or the use of standard-library functions like calloc. It consequently imposes no restrictions on how implementations process pointers to such storage, since from its perspective such storage doesn't exist, pointers that meaningfully identify non-existent storage don't exist, and things that don't exist don't need to have rules written about them.
That doesn't mean that the people on the Committee weren't well aware that many execution environments provided forms of storage that C implementations might know nothing about. The expected, however, that people who actually worked with various platforms would be better placed than the Committee to determine what kinds of things programmers would need to do with such "outside" addresses, and how to best support such needs. No need for the Standard to concern itself with such things.
As it happens, there are some execution environments where it is more convenient for a compiler to treat pointers arithmetic like integer math than to do anything else, and many compilers for such platforms treat pointer arithmetic usefully even in cases where they're not required to do so. For 32-bit and 64-bit x86 and x64, I don't think there are any bit patterns for invalid non-null addresses, but it may be possible to form pointers that don't behave as valid pointers to the objects they address.
For example, given something like:
char x=1,y=2;
ptrdiff_t delta = (uintptr_t)&y - (uintptr_t)&x;
char *p = &x+delta;
*p = 3;
even if pointer representation is defined in such a way that using integer arithmetic to add delta to the address of x would yield y, that would in no way guarantee that a compiler would recognize that operations on *p might affect y, even if p holds y's address. Pointer p would effectively behave as though its address was invalid even though the bit pattern would match that of y's address.
The following examples show that GCC specifically assumes at least the following:
A global array cannot be at address 0.
An array cannot wrap around address 0.
Examples of unexpected behavior arising from arithmetic on invalid pointers in gcc linux x86-64 C++ (thank you melpomene):
largish == NULL evaluates to false in the program in the question.
unsigned n = ...; if (ptr + n < ptr) { /*overflow */ } can be optimized to if (false).
int arr[123]; int n = ...; if (arr + n < arr || arr + n > arr + 123) can be optimized to if (false).
Note that these examples all involve comparison of the invalid pointers, and therefore may not affect the practical case of non-zero-based arrays. Therefore I have opened a new question of a more practical nature.
Thank you everyone in the chat for helping to narrow down the question.
Is it possible, for a pointer variable p, that p<(p+1) is false? Please explain your answer. If yes, under which circumstances can this happen?
I was wondering whether p+1 could overflow and be equal to 0.
E.g. On a 64-bit PC with GCC-4.8 for a C-language program:
int main(void) {
void *p=(void *)0xFFFFFFFFFFFFFFFF;
printf("p :%p\n", p);
printf("p+1 :%p\n", p+1);
printf("Result :%d\n", p<p+1);
}
It returns:
p : 0xffffffffffffffff
p+1 : (nil)
Result : 0
So I believe it is possible for this case. For an invalid pointer location it can happen.
This is the only solution I can think of. Are there others?
Note:
No assumptions are made. Consider any compiler/platform/architecture/OS where there is a chance that this can happen or not.
Is it possible, for a pointer variable p, that p<(p+1) is false?
If p points to a valid object (that is, one created according to the C++ object model) of the correct type, then no. p+1 will point to the memory location after that object, and will always compare greater than p.
Otherwise, the behaviour of both the arithmetic and the comparison are undefined, so the result could be true, false, or a suffusion of yellow.
If yes, under which circumstances can this happen?
It might, or might not, happen with
p = reinterpret_cast<char*>(numeric_limits<uintptr_t>::max);
If pointer arithmetic works like unsigned integer arithmetic, then this might cause a numeric overflow such that p+1 has the value zero, and compares less than p. Or it might do something else.
What if I'm programming on DOS, and I have a far pointer (one composed of a segment and an offset), and it's pointing to the last address in the segment, and I add one to it, and the pointer wraps around? It looks like when you're comparing them, you normalize the pointers, so the second pointer p+1 would be less than p.
This is a stab in the dark though, I don't have a DOS C compiler handy to test on.
Very simple: It cannot happen if there is no undefined behaviour involved. It can happen very easily in the presence of undefined behaviour. For details, read a copy of the C Standard or C++ Standard.
As a result, a conforming compiler is allowed to not evaluate the < operator at all and use 1 or true as the result instead. The same is true for arithmetic with signed integers (but not for unsigned integers, where it is possible for entirely legal code to have x > x+1).
Your example code isn't even C or C++, so you seem to have used the compiler in a mode where it isn't a standard conforming C or C++ compiler.
It could happen with an invalid pointer.
But if the pointer points to a valid memory location, on many operating systems (e.g. Linux), it practically never happens (at least if the sizeof(*p) is not too big), because in practice the first and last pages of the address space are never mapped (but you could force a mapping with mmap & MAP_FIXED).
For freestanding implementations (i.e. inside a kernel, or on some microcontroller), things are different, and implementation specific (perhaps might be undefined behavior, or unspecified behavior).
According to Pointer comparisons in C. Are they signed or unsigned? on Stack Overflow:
You can't legally compare arbitrary pointers in C/C++. The result of such comparison is not defined.
let's say I have:
int test[10];
on a 32bit machine. What if I do:
int b = test[-1];
obviously that's a big no-no when it comes to access an array (out of bound) but what actually happens? Just curious
Am I accessing the 32bit word "before" my array?
int b = *(test - 1);
or just addressing a very far away word (starting at "test" memory location)?
int b = *(test + 0xFFFFFFFF);
0xFFFFFFFF is the two's complement representation of decimal -1
The behaviour of your program is undefined as you are attempting to access an element outside the bounds of the array.
What might be happening is this: Assuming you have a 32 bit int type, you're accessing the 32 bits of memory on the stack (if any) before test[0] and are casting this to an int. Your process may not even own this memory. Not good.
Whatever happens, you get undefined behaviour since pointer arithmetic is only defined within an array (including the one-past-the-end position).
A better question might be:
int test[10];
int * t1 = test+1;
int b = t1[-1]; // Is this defined behaviour?
The answer to this is yes. The definition of subscripting (C++11 5.2.1) is:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
so this is equivalent to *((t1)+(-1)). The definition of pointer addition (C++11 5.7/5) is for all integer types, signed or unsigned, so nothing will cause -1 to be converted into an unsigned type; so the expression is equivalent to *(t1-1), which is well-defined since t1-1 is within the array bounds.
The C++ standard says that it's undefined behavior and illegal. What this means in practice is that anything could happen, and the anything can vary by hardware, compiler, options, and anything else you can think of. Since anything could happen there isn't a lot of point in speculating about what might happen with a particular hardware/compiler combination.
The official answer is that the behavior is undefined. Unofficially, you are trying to access the integer before the start of the array. This means that you instruct the computer to calculate the address that precedes the start of the array by 4 bytes (in your case). Whether this operation will success or not depends on multiple factors. Some of them are whether the array is going to be allocated on the stack segment or static data segment, where specifically the location of that address is going to be. On a general purpose machine (windows/linux) you are likely to get a garbage value as a result but it may also result in a memory violation error if the address happens to be somewhere where the process is not authorized to access. What may happen on a specialized hardware is anybody's guess.
Is the difference of two non-void pointer variables defined (per C99 and/or C++98) if they are both NULL valued?
For instance, say I have a buffer structure that looks like this:
struct buf {
char *buf;
char *pwrite;
char *pread;
} ex;
Say, ex.buf points to an array or some malloc'ed memory. If my code always ensures that pwrite and pread point within that array or one past it, then I am fairly confident that ex.pwrite - ex.pread will always be defined. However, what if pwrite and pread are both NULL. Can I just expect subtracting the two is defined as (ptrdiff_t)0 or does strictly compliant code need to test the pointers for NULL? Note that the only case I am interested in is when both pointers are NULL (which represents a buffer not initialized case). The reason has to do with a fully compliant "available" function given the preceding assumptions are met:
size_t buf_avail(const struct s_buf *b)
{
return b->pwrite - b->pread;
}
In C99, it's technically undefined behavior. C99 §6.5.6 says:
7) For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
[...]
9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [...]
And §6.3.2.3/3 says:
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
So since a null pointer is unequal to any object, it violates the preconditions of 6.5.6/9, so it's undefined behavior. But in practicality, I'd be willing to bet that pretty much every compiler will return a result of 0 without any ill side effects.
In C89, it's also undefined behavior, though the wording of the standard is slightly different.
C++03, on the other hand, does have defined behavior in this instance. The standard makes a special exception for subtracting two null pointers. C++03 §5.7/7 says:
If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type ptrdiff_t.
C++11 (as well as the latest draft of C++14, n3690) have identical wording to C++03, with just the minor change of std::ptrdiff_t in place of ptrdiff_t.
I found this in the C++ standard (5.7 [expr.add] / 7):
If two pointers [...] both are null, and the two pointers are
subtracted, the result compares equal to the value 0 converted to the
type std::ptrdiff_t
As others have said, C99 requires addition/subtraction between 2 pointers be of the same array object. NULL does not point to a valid object which is why you cannot use it in subtraction.
Edit: This answer is only valid for C, I didn't see the C++ tag when I answered.
No, pointer arithmetic is only allowed for pointers that point within the same object. Since by definition of the C standard null pointers don't point to any object, this is undefined behavior.
(Although, I'd guess that any reasonable compiler will return just 0 on it, but who knows.)
The C Standard does not impose any requirements on the behavior in this case, but many implementations do specify the behavior of pointer arithmetic in many cases beyond the bare minimums required by the Standard, including this one.
On any conforming C implementation, and nearly all (if not all) implementations of C-like dialects, the following guarantees will hold for any pointer p such that either *p or *(p-1) identifies some object:
For any integer value z that equals zero, The pointer values (p+z) and (p-z) will be equivalent in every way to p, except that they will only be constant if both p and z are constant.
For any q which is equivalent to p, the expressions p-q and q-p will both yield zero.
Having such guarantees hold for all pointer values, including null, may eliminate the need for some null checks in user code. Further, on most platforms, generating code that upholds such guarantees for all pointer values without regard for whether they are null would be simpler and cheaper than treating nulls specially. Some platforms, however, may trap on attempts to perform pointer arithmetic with null pointers, even when adding or subtracting zero. On such platforms, the number of compiler-generated null checks that would have to be added to pointer operations to uphold the guarantee would in many cases vastly exceed the number of user-generated null checks that could be omitted as a result.
If there were an implementation where the cost of upholding the guarantees would be great, but few if any programs would receive any benefit from them, it would make sense to allow it to trap "null+zero" computations, and require that user code for such an implementation include the manual null checks that the guarantees could have made unnecessary. Such an allowance was not expected to affect the other 99.44% of implementations, where the value of upholding the guarantees would exceed the cost. Such implementations should uphold such guarantees, but their authors shouldn't need the authors of the Standard to tell them that.
The authors of C++ have decided that conforming implementations must uphold the above guarantees at any cost, even on platforms where they could substantially degrade the performance of pointer arithmetic. They judged that the value of the guarantees even on platforms where they would be expensive to uphold would exceed the cost. Such an attitude may have been affected by a desire to treat C++ as a higher-level language than C. A C programmer could be expected to know when a particular target platform would handle cases like (null+zero) in unusual fashion, but C++ programmers weren't expected to concern themselves with such things. Guaranteeing a consistent behavioral model was thus judged to be worth the cost.
Of course, nowadays questions about what is "defined" seldom have anything to do with what behaviors a platform can support. Instead, it is now fashionable for compilers to--in the name of "optimization"--require that programmers manually write code to handle corner cases which platforms would previously have handled correctly. For example, if code which is supposed to output n characters starting at address p is written as:
void out_characters(unsigned char *p, int n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
older compilers would generate code that would reliably output nothing, with
no side-effect, if p==NULL and n==0, with no need to special-case n==0. On
newer compilers, however, one would have to add extra code:
void out_characters(unsigned char *p, int n)
{
if (n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
}
which an optimizer may or may not be able to get rid of. Failing to include the extra code may cause some compilers to figure that since p "can't possibly be null", any subsequent null pointer checks may be omitted, thus causing the code to break in a spot unrelated to the actual "problem".
I compiled Qt in 64 bit. My code is also compiled in 64 bit. I initialize a (pointer) member variable to zero. When I inspect it, XCode tells me that its value is not 0 but 0xffffffff00000000.
Is this a sign of a mix-up between 32 and 64? How might the 32 bit initialization have crept into the executable when both the library and my code have 'g++ .. -arch x86_64 -Xarch_x86_64 .. '? In case it matters, I am on Snow Leopard.
----Begin-Edit----
I appreciate finding out after all these years that the standard does not impose the value 0x00..00 when one assigns 0 to a pointer, but this is not the issue in this case.
#include <stdio.h>
int main()
{
const char * c = "Foo";
printf("Pointers in this executable use %lu bytes.\n", sizeof(c));
void * z = 0;
printf("A zero pointer in this executable is %p\n", z);
}
If I save the code above in '32_or_64.cpp' then compile it with 'g++ -arch i386 32_or_64.cpp', I get
Pointers in this executable use 4 bytes.
A zero pointer in this executable is 0x0
If I compile it with 'g++ -arch x86_64 32_or_64.cpp', I get
Pointers in this executable use 8 bytes.
A zero pointer in this executable is 0x0
If you believe that this does not establish that 0 on my particular configuration should not let me see precisely 0 when debugging in x86_64, please point it out. Otherwise, debating 'null' is a wonderful discussion, but an irrelevant one in this thread.
----End-Edit----
Update: this explanation seems bogus in the light of π's edit. But you might find it interesting anyway.
In C-like languages, a pointer value written as 0 in the source code is just a convention for specifying a null pointer. A null pointer is a pointer that is guaranteed not to point to any object, and it is defined to test equal to the integer zero, but it doesn't need to have the same internal representation as the integer zero. Null pointers can have a variety of representations, depending on the architecture, or even on the type of the pointer.
The use of 0 to mean "null pointer" is perhaps an unfortunate convention; the level of confusion it causes is perhaps best indicated by the length of Steve Summit's C programming language FAQ on the subject.
hexa's comment is, I think, evidence of the difficulty of understanding this convention. The trouble is that there are three ideas to be separated:
The concept of a null pointer: a pointer that's distinct from a pointer to any object.
The representation of a null pointer on a machine (in some cases by the address 0x00000000, but that's not something you can or should rely on).
How you can create and test null pointers in C-like languages (by using a null pointer constant like 0 or NULL).
Here's the C++ standard, section 4.10:
A null pointer constant is an integral constant expression rvalue of integer type that evaluates to zero. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of pointer to object or pointer to function type. Two null pointer values of the same type shall compare equal.
This guarantees that you can create a null pointer using the constant 0, and test whether a pointer is null by comparison with 0, but says nothing about the machine representation of the null pointer.
Perfectly possible that your this pointer is not pointing to the correct memory. If your program exhibits other undefined behaviour, then it's perfectly possible that this is just random garbage memory.