Throughout various code, I have seen memory allocation in debug builds with NULL...
memset(ptr,NULL,size);
Or with 0xDEADBEEF...
memset(ptr,0xDEADBEEF,size);
What are the advantages to using each one, and what is the generally preferred way to achieve this in C/C++?
If a pointer was assigned a value of 0xDEADBEEF, couldn't it still deference to valid data?
Using either memset(ptr, NULL, size) or memset(ptr, 0xDEADBEEF, size) is a clear indication of the fact that the author did not understand what they were doing.
Firstly, memset(ptr, NULL, size) will indeed zero-out a memory block in C and C++ if NULL is defined as an integral zero.
However, using NULL to represent the zero value in this context is not an acceptable practice. NULL is a macro introduced specifically for pointer contexts. The second parameter of memset is an integer, not a pointer. The proper way to zero-out a memory block would be memset(ptr, 0, size). Note: 0 not NULL. I'd say that even memset(ptr, '\0', size) looks better than memset(ptr, NULL, size).
Moreover, the most recent (at the moment) C++ standard - C++11 - allows defining NULL as nullptr. nullptr value is not implicitly convertible to type int, which means that the above code is not guaranteed to compile in C++11 and later.
In C language (and your question is tagged C as well) macro NULL can expand to (void *) 0. Even in C (void *) 0 is not implicitly convertible to type int, which means that in general case memset(ptr, NULL, size) is simply invalid code in C.
Secondly, even though the second parameter of memset has type int, the function interprets it as an unsigned char value. It means that only one lower byte of the value is used to fill the destination memory block. For this reason memset(ptr, 0xDEADBEEF, size) will compile, but will not fill the target memory region with 0xDEADBEEF values, as the author of the code probably naively hoped. memset(ptr, 0xDEADBEEF, size) is eqivalent to memset(ptr, 0xEF, size) (assuming 8-bit chars). While this is probably good enough to fill some memory region with intentional "garbage", things like memset(ptr, NULL, size) or memset(ptr, 0xDEADBEEF, size) still betray the major lack of professionalism on the author's part.
Again, as other answer have already noted, the idea here is to fill the unused memory with a "garbage" value. Zero is certainly not a good idea in this case, since it is not "garbagy" enough. When using memset you are limited to one-byte values, like 0xAB or 0xEF. If this is good enough for your purposes, use memset. If you want a more expressive and unique garbage value, like 0xDEDABEEF or 0xBAADFOOD, you won't be able to use memset with it. You'll have to write a dedicated function that can fill memory region with 4-byte pattern.
A pointer in C and C++ cannot be assigned an arbitrary integer value (other than a Null Pointer Constant, i.e. zero). Such assignment can only be achieved by forcing the integral value into the pointer with an explicit cast. Formally speaking, the result of such a cast is implementation defined. The resultant value can certainly point to valid data.
Writing 0xDEADBEEF or another non-zero bit pattern is a good idea to be able to catch both write-after-delete and read-after-delete uses.
1) Write after delete
By writing a specific pattern you can check if a block that has already been deallocated was written over later by buggy code; in our debug memory manager we use a free list of blocks and before recycling a memory block we check that our custom pattern are still written all over the block. Of course it's sort of "late" when we discover the problem, but still much earlier than when it would be discovered not doing the check.
Also we have a special function that is called periodically and that can also be called on demand that just goes through the list of all freed memory blocks and check their consistency and so we can call this function often when chasing a bug. Using 0x00000000 as value wouldn't be as effective because zero may possibly be exactly the value that buggy code wants to write in the already deallocated block e.g. zeroing a field or setting a pointer to NULL (it's instead more unlikely that the buggy code wants to write 0xDEADBEEF).
2) Read after delete
Leaving the content of a deallocated block untouched or even writing just zeros will increase the possibility that someone reading the content of a dead memory block will still find the values reasonable and compatible with invariants (e.g. a NULL pointer as on many architectures NULL is just binary zeroes, or the integer 0, the ASCII NUL char or a double value 0.0).
By writing instead "strange" patterns like 0xDEADBEEF most of code that will access in read mode those bytes will probably find strange unreasonable values (e.g. the integer -559038737 or a double with value -1.1885959257070704e+148), hopefully triggering some other self consistency check assertion.
Of course nothing is really specific to the bit pattern 0xDEADBEEF, actually we use different patterns for freed blocks, before-block area, after-block area and and also our memory manager writes another (address-dependent) specific bit pattern to the content part of any memory block before giving it to the application (this is to help finding uses of uninitialized memory).
I would definitely recommend 0xDEADBEEF. It clearly identifies uninitialized variables, and accesses to uninitialized pointers.
Being odd, dereferencing a 0xdeadbeef pointer will definitely crash on the PowerPC architecture when loading a word, and very likely crash on other architectures since the memory is likely to be outside the process' address space.
Zeroing out memory is a convenience since many structures/classes have member variables that use 0 as their initial value, but I would very much recommend initializing each member in the constructor rather than using the default memory fill. You will really want to be on top of whether or not you properly initialized your variables.
http://en.wikipedia.org/wiki/Hexspeak
These "magic" numbers are are a debugging aid to identify bad pointers, uninitialized memory etc. You want a value that is unlikely to occur during normal execution and something that is visible when doing memory dumps or inspecting variables. Initializing to zero is less useful in this regard. I would guess that when you see people initialize to zero it is because they need to have that value at zero. A pointer with a value of 0xDEADBEEF could point to a valid memory location so it's a bad idea to use that as an alternative to NULL.
One reason that you null the buffer or set it to a special value is that you can easily tell whether the buffer contents is valid or not in the debugger.
Dereferencing a pointer of value "0xDEADBEEF" is almost always dangerous(probably crashes your program/system) because in most cases you have no idea what is stored there.
DEADBEEF is an example of HexSpeek. With it, as a programmer you convey intentionally an error condition.
I would personally recommend using NULL (or 0x0) as it represents the NULL as expected and comes in handy while comparison. Imagine you are using char * and in between on DEADBEEF for some reason (don't know why), then at least your debugger will come very handy to tell you that its 0x0.
I would go for NULL because it's much easier to mass zero out memory than to go through later and set all the pointers to 0xDEADBEEF. In addition, there's nothing at all stopping 0xDEADBEEF from being a valid memory address on x86- admittedly, it would be unusual, but far from impossible. NULL is more reliable.
Ultimately, look- NULL is the language convention. 0xDEADBEEF just looks pretty and that's it. You gain nothing for it. Libraries will check for NULL pointers, they don't check for 0xDEADBEEF pointers. In C++ then the idea of the zero pointer isn't even tied to a zero value, just indicated with the literal zero, and in C++0x there is a nullptr and a nullptr_t.
Vote me down if this is too opinion-y for StackOverflow but I think this whole discussion is a symptom of a glaring hole in the toolchain we use to make software.
Detecting uninititialized variables by initializing memory with "garabage-y" values detects only some kinds of errors in some kinds of data.
And detecting uninititialized variables in debug builds but not for release builds is like following safety procedures only when testing an aircraft and telling the flying public to be satisfied with "well, it tested OK".
WE NEED HARDWARE SUPPORT for detecting uninitialized variables. As in something like an "invalid" bit that accompanies every addressability entity of memory (=byte on most of our machines) and which is set by the OS in every byte VirtualAlloc() (et. al, or equivalents on other OS's) hands over to applications and which is automatically cleared when the byte is written to but which causes an exception if read first.
Memory is cheap enough for this and processors are fast enough for this. This end of reliance on "funny" patterns and keeps us all honest to boot.
Note that the second argument in memset is supposed to be a byte, that is it is implicitely cast to a char or similar. 0xDEADBEEF would for most platforms convert to 0xEF (and something else for some odd platform).
Also note that the second argument is supposed to formally be an int which NULL isn't.
Now for the advantage of doing these kind of initialization. First of course the behavior would more likely be deterministic (even if we by this ends up in undefined behavior the behavior would in practice be consistent).
Having deterministic behavior will mean that debugging becomes easier, when you found a bug you would "only" have to provide the same input and the fault will manifest itself.
Now when you select which value you would use you should select a value that most likely will result in bad behavior - which means the use of uninitialized data would more likely result in a fault being observed. This means that you would have to use some knowledge of the platform in question (however many of them behave quite similar).
If the memory is used to hold pointers then indeed having cleared the memory will mean that you get a NULL pointer and normally dereferencing that will result in segmentation fault (which will be observed as a fault). However if you use it in another way, for example as an arithmetic type then you will get 0 and for many application that is not that odd number.
If you instead use 0xDEADBEEF you will get a quite large integer, also when interpreting the data as floating point it will also be quite large number (IIRC). If interpreting it as text it will be very long and contain non-ascii characters and if you use UTF-8 encoding it will likely be invalid. Now if used as a pointer on some platform it would fail alignment requirements for some types - also on some platforms that region of memory might be mapped out anyway (note that on x86_64 the value of the pointer would be 0xDEADBEEFDEADBEEF which is out of range for an address).
Note that while filling with 0xEF will have pretty much similar properties, if you want to fill the memory with 0xDEADBEEF you would need to use a custom function since memset doesn't do the trick.
Related
How can I create a reserved pointer value?
The context is this: I have been thinking of how to implement a data structure for a dynamic scripting language (I am not planning on implementing this - just wondering how it would be done).
Strings may contain arbitrary bytes, including NUL. Thus, it is necessary to store the value separately. This requires a pointer (to point to the array) and a number. The first trick is that if the pointer is NULL, it cannot possibly be a valid string, so the number can be used for an actual integer.
If a second reserved pointer value could be created, this could be used to imply that the other field is now being used as a floating-point value. Can this be done?
One thought is to mmap() an address with no permissions, which could also be done to replace the usage of the NULL pointer.
On any modern system, you can just use the pointer values 1, 2, ... 4095 for such purposes. Another frequent choice is (uintptr_t)-1, which is technically inferior, but used more frequently than 1 nevertheless.
Why are these values "safe"?
Modern systems safeguard against NULL pointer accesses by making it impossible to map anything at virtual address zero. Almost any dereferencing of a NULL pointer will hit this nonexistant region, and the hardware will tell the OS system that something bad happened, which triggers the OS to segfault the process.
Since virtual memory pages are page aligned (at least 4k on current hardware), and nothing is mapped to address zero, nothing can be mapped to the entire range 0, ..., 4095, protecting all these addresses in the same way, and you can use them as special purpose values.
How much virtual memory space is reserved for this purpose is a system parameter, on linux it is controlled by /proc/sys/vm/mmap_min_addr, and the root user can change it to zero, which would disable this protection (which would not be a very smart idea). The default on Ubuntu is 64k (i. e. 16 pages).
This is also the reason why (uintptr_1)-1 is less safe than 1; even though any load of more than one byte will hit the zero page, the address (uintptr_1)-1 itself is not necessarily protected in this way. Consequently, doing string operations on (char*)-1 does not necessarily segfault.
Edit:
My original explanation with the special mapping seems to have been a bit stale, probably this was the way things were handled on the old Mac/PPC platform. Even though the effect is pretty much the same, I changed the details of the answer to reflect modern linux. Anyway, the important point is not how the null page protection is achieved, the important point is that any sane, modern system will have some null page protection that encompasses at least the mentioned address range. Some more details can be found in this SO answer: https://stackoverflow.com/a/12645890/2445184
In standard C (and standard C++), the approach that's 100% valid and works is simple: declare a variable, use its address as a magic value.
char *ptr;
char magic;
if (ptr == &magic) { ... }
This guarantees that magic will never have any overlap with another object.
Magic pointer values such as (char *) 1 have their advantages too, but it's so easy to get them wrong (even if you disregard the theoretical implementations where (char *) 1 may be a valid object, if you use (int *) 1 as a magic pointer value, and the optimiser assumes int * values are suitably aligned, it may removes checks that are no-ops only in 100% valid code, not in your code) that I'd recommend the standard approach, and optionally temporarily switch to magic pointer values only if you find they help you debug.
mmaping an address can fail if the address is already assigned. Probably it would better to use an address of some static variable or function. Or to obtain an unique address via malloc(1).
I know NULL (0x00000000) is a pointer to nothing because the OS doesn't allow the process to allocate any memory at this location. But if I use 0x00000001 (Magic number or code-pointer), is it safe to assume as well that the OS wont allow memory to be allocated here?
If so then until where is it safe to assume that?
Standard (first)
The Standard only guarantees that 0 is a sentinel value as far as pointers go. The underlying memory representation is no way guaranteed; it's implementation defined.
Using a pointer set to that sentinel value for anything else than reading the pointer state or writing a new state (which includes dereferencing or pointer arithmetic) is undefined behavior.
Virtual Memory
In the days of virtual memory (ie, each process gets its own memory space, independent from the others), a null pointer is most often indeed represented as 0 in the process memory space. I don't know of any other architectures actually, though I imagine that in mainframes it may not be so.
Unix
In the Unix world, it is typical to reserve all the address space below 0x8000 for null values. The memory is not allocated, really, it is just protected (ie, placed in a special mode), so that the OS will trigger a segmentation fault should you ever try to read it or write to it.
The idea of using such a range is that a null pointer is not necessarily used as is. For example if you use a std::pair<int, int>* p = 0; which is null, and call p->second, then the compiler will perform the arithmetic necessary to point to second (ie +4 generally) and attempt to access the memory at 0x4 directly. The problem is obviously compounded by arrays.
In practice, this 0x8000 limit should be practical enough to detect most issues (and avoid memory corruption or others). In this case, this means that you avoid the undefined behavior and get a "proper" crash. However, should you be using a large array you could overshoot it, so it's not a silver bullet.
The particular limit of your implementation or compiler/runtime stack can be determined either through documentation or by successive trials. There might even be a way to tweak it.
You should not assume anything about the actual values of pointers. Especially, the null pointer is not required to be represented by a zero address, even though the literal 0 does look like a zero.
The only valid range is supposed to be range allocated to you by the OS.ANYTHING else should be denied by the OS.
An exception to that rule is the shared memory.
The C++ standard doesn't "reserve" any pointer addresses other than zero (null). So it is not safe to use 1 or any other value as a "magic" pointer value. Of course, in practice, some implementations of c++ probably do not every use certain values. But you don't get any guarantees from the language definition.
I will try to give a broad view about this:
you probably will never ever access the real memory addresses because of the multiple sandboxing mechanism that every modern OS has and puts in place.
What is a NULL pointer from the software viewpoint ? a NULL pointer is a pointer variable that stores a value that the programmer pick as a meaningfull value and this value is used as a label with the following meaning "this pointer goes nowhere". a NULL pointer does not point to 0x000000 by definition, the definition of a NULL pointer it's not about where that pointer will point to but the value of this macro called NULL and this value will be the value of this NULL pointer.
in C you can assume that NULL == 0, only in C NULL is a macro that defines NULL as an int that is equal to 0, in C++ you do not have this liberty
there are types, labels and values ( in better terms, representations of values not real values ) for every variables, at least for primitives values, the same is for the pointers, if you are speaking about void pointers you are speaking about pointers that contains a memory address ( just like any pointer ) and the only special thing about this pointers is that they need a cast in C++ to be decoded, safely and effectively; it's a big mistake if you think about void* as pointers that points to nowhere or to 0 or to NULL or to 0x0000000
by the way, i still don't get your problem ...
A modern OS is likely to reserve at least one page for NULL pointer. So 0x1 (or 0x4 if you want 32-bit alignment) is likely to work.
But remember this is not guaranteed by C/C++ language. You would have to rely on your OS and compiler for such behavior.
Further more, there's no guarantee about the actual value of the NULL pointer. It may or may not be all zeros. If it's not, your trick won't work at all.
The following example is from Wikipedia.
int arr[4] = {0, 1, 2, 3};
int* p = arr + 5; // undefined behavior
If I never dereference p, then why is arr + 5 alone undefined behaviour? I expect pointers to behave as integers - with the exception that when dereferenced the value of a pointer is considered as a memory address.
That's because pointers don't behave like integers. It's undefined behavior because the standard says so.
On most platforms however (if not all), you won't get a crash or run into dubious behavior if you don't dereference the array. But then, if you don't dereference it, what's the point of doing the addition?
That said, note that an expression going one over the end of an array is technically 100% "correct" and guaranteed not to crash per §5.7 ¶5 of the C++11 spec. However, the result of that expression is unspecified (just guaranteed not to be an overflow); while any other expression going more than one past the array bounds is explicitly undefined behavior.
Note: That does not mean it is safe to read and write from an over-by-one offset. You likely will be editing data that does not belong to that array, and will cause state/memory corruption. You just won't cause an overflow exception.
My guess is that it's like that because it's not only dereferencing that's wrong. Also pointer arithmetics, comparing pointers, etc. So it's just easier to say don't do this instead of enumerating the situations where it can be dangerous.
The original x86 can have issues with such statements. On 16 bits code, pointers are 16+16 bits. If you add an offset to the lower 16 bits, you might need to deal with overflow and change the upper 16 bits. That was a slow operation and best avoided.
On those systems, array_base+offset was guaranteed not to overflow, if offset was in range (<=array size). But array+5 would overflow if array contained only 3 elements.
The consequence of that overflow is that you got a pointer which doesn't point behind the array, but before. And that might not even be RAM, but memory-mapped hardware. The C++ standard doesn't try to limit what happens if you construct pointers to random hardware components, i.e. it's Undefined Behavior on real systems.
If arr happens to be right at the end of the machine's memory space then arr+5 might be outside that memory space, so the pointer type might not be able to represent the value i.e. it might overflow, and overflow is undefined.
"Undefined behavior" doesn't mean it has to crash on that line of code, but it does mean that you can't make any guaranteed about the result. For example:
int arr[4] = {0, 1, 2, 3};
int* p = arr + 5; // I guess this is allowed to crash, but that would be a rather
// unusual implementation choice on most machines.
*p; //may cause a crash, or it may read data out of some other data structure
assert(arr < p); // this statement may not be true
// (arr may be so close to the end of the address space that
// adding 5 overflowed the address space and wrapped around)
assert(p - arr == 5); //this statement may not be true
//the compiler may have assigned p some other value
I'm sure there are many other examples you can throw in here.
Some systems, very rare systems and I can't name one, will cause traps when you increment past boundaries like that. Further, it allows an implementation that provides boundary protection to exist...again though I can't think of one.
Essentially, you shouldn't be doing it and therefor there's no reason to specify what happens when you do. Specifying what happens puts unwarranted burden on the implementation provider.
This result you are seeing is because of the x86's segment-based memory protection. I find this protection to be justified as when you are incrementing the pointer address and storing, It means at future point of time in your code you will be dereferencing the pointer and using the value. So compiler wants to avoid such kind of situations where you will end up changing some other's memory location or deleting the memory which is being owned by some other guy in your code. To avoid such scenario's compiler has put the restriction.
In addition to hardware issues, another factor was the emergence of implementations which attempted to trap on various kinds of programming errors. Although many such implementations could be most useful if configured to trap on constructs which a program is known not to use, even though they are defined by the C Standard, the authors of the Standard did not want to define the behavior of constructs which would--in many programming fields--be symptomatic of errors.
In many cases, it will be much easier to trap on actions which use pointer arithmetic to compute address of unintended objects than to somehow record the fact that the pointers cannot be used to access the storage they identify, but could be modified so that they could access other storage. Except in the case of arrays within larger (two-dimensional) arrays, an implementation would be allowed to reserve space that's "just past" the end of every object. Given something like doSomethingWithItem(someArray+i);, an implementation could trap any attempt to pass any address which doesn't point to either an element of the array or the space just past the last element. If the allocation of someArray reserved space for an extra unused element, and doSomethingWithItem() only accesses the item to which it receives a pointer, the implementation could relatively inexpensively ensure that any non-trapped execution of the above code could--at worst--access otherwise-unused storage.
The ability to compute "just-past" addresses makes bounds checking more difficult than it otherwise would be (the most common erroneous situation about would be passing doSomethingWithItem() a pointer just past the end of the array, but behavior would be defined unless doSomethingWithItem would try to dereference that pointer--something the caller may be unable to prove). Because the Standard would allow compilers to reserve space just past the array in most cases, however, such allowance would allow implementations to limit the damage caused by untrapped errors--something that would likely not be practical if more generalized pointer arithmetic were allowed.
I understand the purpose of the NULL constant in C/C++, and I understand that it needs to be represented some way internally.
My question is: Is there some fundamental reason why the 0-address would be an invalid memory-location for an object in C/C++? Or are we in theory "wasting" one byte of memory due to this reservation?
The null pointer does not actually have to be 0. It's guaranteed in the C spec that when a constant 0 value is given in the context of a pointer it is treated as null by the compiler, however if you do
char *foo = (void *)1;
--foo;
// do something with foo
You will access the 0-address, not necessarily the null pointer. In most cases this happens to actually be the case, but it's not necessary, so we don't really have to waste that byte. Although, in the larger picture, if it isn't 0, it has to be something, so a byte is being wasted somewhere
Edit: Edited out the use of NULL due to the confusion in the comments. Also, the main message here is "null pointer != 0, and here's some C/pseudo code that shows the point I'm trying to make." Please don't actually try to compile this or worry about whether the types are proper; the meaning is clear.
This has nothing to do with wasting memory and more with memory organization.
When you work with the memory space, you have to assume that anything not directly "Belonging to you" is shared by the entire system or illegal for you to access. An address "belongs to you" if you have taken the address of something on the stack that is still on the stack, or if you have received it from a dynamic memory allocator and have not yet recycled it. Some OS calls will also provide you with legal areas.
In the good old days of real mode (e.g., DOS), all the beginning of the machine's address space was not meant to be written by user programs at all. Some of it even mapped to things like I/O.
For instance, writing to the address space at 0xB800 (fairly low) would actually let you capture the screen! Nothing was ever placed at address 0, and many memory controller would not let you access it, so it was a great choice for NULL. In fact, the memory controller on some PCs would have gone bonkers if you tried writing there.
Today the operating system protects you with a virtual address space. Nevertheless, no process is allowed to access addresses not allocated to it. Most of the addresses are not even mapped to an actual memory page, so accessing them will trigger a general protection fault or the equivalent in your operating system. This is why 0 is not wasted - even though all the processes on your machine "have an address 0", if they try to access it, it is not mapped anywhere.
There is no requirement that a null pointer be equal to the 0-address, it's just that most compilers implement it this way. It is perfectly possible to implement a null pointer by storing some other value and in fact some systems do this. The C99 specification §6.3.2.3 (Pointers) specifies only that an integer constant expression with the value 0 is a null pointer constant, but it does not say that a null pointer when converted to an integer has value 0.
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.
Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
On some embedded systems the zero memory address is used for something addressable.
The zero address and the NULL pointer are not (necessarily) the same thing. Only a literal zero is a null pointer. In other words:
char* p = 0; // p is a null pointer
char* q = 1;
q--; // q is NOT necessarily a null pointer
Systems are free to represent the null pointer internally in any way they choose, and this representation may or may not "waste" a byte of memory by making the actual 0 address illegal. However, a compiler is required to convert a literal zero pointer into whatever the system's internal representation of NULL is. A pointer that comes to point to the zero address by some way other than being assigned a literal zero is not necessarily null.
Now, most systems do use 0 for NULL, but they don't have to.
It is not necessarily an illegal memory location. I have stored data by dereferencing a pointer to zero... it happens the datum was an interrupt vector being stored at the vector located at address zero.
By convention it is not normally used by application code since historically many systems had important system information starting at zero. It could be the boot rom or a vector table or even unused address space.
On many processors address zero is the reset vector, wherein lies the bootrom (BIOS on a PC), so you are unlikely to be storing anything at that physical address. On a processor with an MMU and a supporting OS, the physical and logical address addresses need not be the same, and the address zero may not be a valid logical address in the executing process context.
NULL is typically the zero address, but it is the zero address in your applications virtual address space. The virtual addresses that you use in most modern operating systems have exactly nothing to do with actual physical addresses, the OS maps from the virtual address space to the physical addresses for you. So, no, having the virtual address 0 representing NULL does not waste any memory.
Read up on virtual memory for a more involved discussion if you're curious.
I don't see the answers directly addressing what i think you were asking, so here goes:
Yes, at least 1 address value is "wasted" (made unavailable for use) because of the constant used for null. Whether it maps to 0 in linear map of process memory is not relevant.
And the reason that address won't be used for data storage is that you need that special status of the null pointer, to be able to distinguish from any other real pointer. Just like in the case of ASCIIZ strings (C-string, NUL-terminated), where the NUL character is designated as end of character string and cannot be used inside strings. Can you still use it inside? Yeah but that will mislead library functions as of where string ends.
I can think of at least one implementation of LISP i was learning, in which NIL (Lisp's null) was not 0, nor was it an invalid address but a real object. The reason was very clever - the standard required that CAR(NIL)=NIL and CDR(NIL)=NIL (Note: CAR(l) returns pointer to the head/first element of a list, where CDR(l) returns ptr to the tail/rest of the list.). So instead of adding if-checks in CAR and CDR whether the pointer is NIL - which will slow every call - they just allocated a CONS (think list) and assigned its head and tail to point to itself. There! - this way CAR and CDR will work and that address in memory won't be reused (because it is taken by the object devised as NIL)
ps. i just remembered that many-many years ago i read about some bug of Lattice-C that was related to NULL - must have been in the dark MS-DOS segmentation times, where you worked with separate code segment and data segment - so i remember there was an issue that it was possible for the first function from a linked library to have address 0, thus pointer to it will be considered invalid since ==NULL
But since modern operating systems can map the physical memory to logical memory addresses (or better: modern CPUs starting with the 386), not even a single byte is wasted.
As people already have pointed out, the bit representation of the NULL pointer has not to be the same as the bit represention of a 0 value. It is though in nearly all cases (the old dinosaur computers that had special addresses can be neglected) because a NULL pointer can also be used as a boolean and by using an integer (of suffisent size) to hold the pointer value it is easier to represent in the common ISAs of modern CPU. The code to handle it is then much more straight forward, thus less error prone.
You are correct in noting that the address space at 0 is not usable storate for your program. For a number of reasons a variety of systems do not consider this a valid address space for your program anyway.
Allowing any valid address to be used would require a null value flag for all pointers. This would exceed the overhead of the lost memory at address 0. It would also require additional code to check and see if the address were null or not, wasting memory and processor cycles.
Ideally, the address that NULL pointer is using (usually 0) should return an error on access. VAX/VMS never mapped a page to address 0 so following the NULL pointer would result in a failure.
The memory at that address is reserved for use by the operating system. 0 - 64k is reserved. 0 is used as a special value to indicate to developers "not a valid address".
I am wondering , what exactly is stored in the memory when we say a particular variable pointer to be NULL. suppose I have a structure, say
typdef struct MEM_LIST MEM_INSTANCE;
struct MEM_LIST
{
char *start_addr;
int size;
MEM_INSTANCE *next;
};
MEM_INSTANCE *front;
front = (MEM_INSTANCE*)malloc(sizeof(MEM_INSTANCE*));
-1) If I make front=NULL. What will be the value which actually gets stored in the different fields of the front, say front->size ,front->start_addr. Is it 0 or something else. I have limited knowledge in this NULL thing.
-2) If I do a free(front); It frees the memory which is pointed out by front. So what exactly free means here, does it make it NULL or make it all 0.
-3) What can be a good strategy to deal with initialization of pointers and freeing them .
Thanks in advance
There are many good contributed answers which adequately address the questions. However, the coverage of NULL is light.
In a modern virtual memory architecture, NULL points to memory for which any reference (that is, an attempt to read from or write to memory at that address) causes a segfault exception—also called an access violation or memory fault. This is an intentional protective mechanism to detect and deal appropriately with invalid memory accesses:
char *p = 0;
for (int j = 0; j < 50000000; ++j)
*(p += 1000000) = 10;
This code writes a ten at every millionth memory byte. It won't run for many loops—probably not even once. Either it will attempt to access an unmapped address, or it will attempt to modify read-only memory—where constant data or program code reside. The CPU will interrupt the instruction midway and report the exception to the operating system. Since there's no exception handling specified, the default o/s handling is to terminate the program. Linux displays Segmentation fault (for historical reasons). MS Windows is inconsistent, but tends to say access violation. The same should happen with any program in protected virtual memory doing this:
char *p = NULL;
if (p [34] == 'Y')
printf ("A miracle has occurred!\n");
This segfaults. A memory location near the NULL address is being dereferenced.
At the risk of confusion, it is possible that a large offset from zero will be valid memory. Thirty-four certainly won't be okay, but 34,000 might be. Different operating systems and different program processing tools (linkers) reserve a fixed amount of the zero end of memory and arrange for it to be unmapped. It could be as little as 1K, though 8K was a popular choice in the 1990s. Systems with ample virtual address space (not memory, but potential memory) might leave an 8M or 16M memory hole. Some modern operating systems randomize the amount of space reserved, as well as randomly varying the locations for the code and data sections each time a program starts.
The other extreme is non-virtual memory architectures. Typically, these provide valid addresses beginning at address zero up to the limit of installed memory. Such is common in embedded processors, many DSPs, and pre-protected mode CPUs, 8 and 16-bit processors like the 8086, 68000, etc. Reading a NULL address does not cause any special CPU reaction—it simply reads whatever is there, which is usually interrupt vectors. Writes to low memory usually result in hard-to-diagnose dire consequences as interrupt vectors are used asynchronously and infrequently.
Even stranger is the segment model of the oddly named "real-mode" x86 using small or medium memory model conventions. Such data addresses are 16 bits using the DS register which is set to the program's initialized data area. Dereferencing NULL accesses the first bytes of this space, but MSDOS programs contain ancient runtime structures for compatibility with CP/M, an o/s Fred Flintstone used. No exceptions, and maybe no consequences for modifying the memory near NULL in this environment. These were challenging bugs to find without program source code.
Virtual memory protection was a huge leap forward in creating stable systems and protecting programmers from themselves. Properly used NULLs provide significant safety and rapid debugging of programming flaws.
Wow, a lot of answers effectively claim that assigning NULL to a pointer sets it to point to the address 0, confusing value and representation. It does not. Setting a pointer to the value NULL or 0 is an abstract conception that sets the pointer to an invalid value not pointing to any valid object. The binary representation actually stored in memory does not need to be all bits 0. This is usually not an architecture thing, it is up to the compiler. In fact I had an old DOS compiler (on x86) that used all bits 1 for a NULL pointer.
Additionally, any pointer type is allowed to have its own binary representation for NULL, as long as all these pointers compare as equal when compared.
Granted, most of the times all bits are 0 for a NULL pointer for practical reasons, but it is not required. This means that using calloc() or memset(0) is not a portable initialization of pointers.
NULL assigned to a pointer does not change the "fields pointed by it".
In your case if you make front = NULL, front will no longer point to the structure allocated by your malloc, but will contain zero (NULL is 0 according to the C standard). Nothing will point to your allocated struct - it's a memory leak.
Note the critical distinction here between the pointer (front) and what it points to (the structure) - it's a big difference.
To answer your specific questions:
If you run front=NULL, front will no longer point to a MEM_INSTANCE structure, and hence front->size will have no meaning (it will probably crash the program)
If you do free(front) the OS will free the memory allocated to you for the MEM_INSTANCE structure. front will now point to memory that's no longer yours - and you can't access it
It's a broad question - please ask a more specific one.
Assignment to a pointer is not the same as assigmment to the elements to which the pointer points. Assigning NULL to front will make it so that front points to nothing, but your allocated memory will be unaffected. It will not write any data into the fields formerly pointed to by front Moreover, that is a memory leak.
Invoking free(front) will deallocate the block of memory but will not affect the value of front; in other words, front will point to a memory region that you no longer own and which is no longer valid for you to access. This is also known as a "dangling pointer", and it is generally a good idea to follow free(front) immediately with front=NULL so that you know that front is no longer valid.
A good strategy for dealing with pointers is, at least in C++, to use smart pointer classes and to perform allocation only in constructors and to perform deallocation only in destructors. Another good strategy is to ensure that you always assign NULL to any pointer that you have just freed. In C, you really just have to make sure that your allocations are matched properly with deallocations. It can also help to use "name_of_object_create" and "name_of_object_destroy" functions that parallel C++ constructors/destructors; however, there is no way in C to ensure automatic destruction.
NULL is a sentinel value that denotes that a pointer does not point to a meaningful location. I.e. "Do not attempt to dereference this".
So what exactly free means here, does it make it NULL or make it all 0.
Neither. It merely frees the memory block that front points to. The value in front remains as it was.
In C/C++ NULL == 0.
int* a = NULL;
int* b = 0;
The value stored in variables 'a' & 'b' will both be 0. There is no special magic to "NULL". The day this was explained to me, pointers suddenly made sense.
The definition of NULL is usually
#define NULL 0
or
#define NULL (void*)0
Assigning it to a pointer merely makes that pointer stop pointing at whatever memory address it was pointing at and now point to memory address 0. This is usually done at initialization or after a pointer's memory has been free-d, though it isn't necessary. Setting a pointer equal to NULL does not deallocate memory or change any values of whatever it used to be pointing to.
Calling free() (in C) or delete (in C++) will deallocate the memory the pointer pointed to, but it will not set the pointer to NULL. Dereferencing the pointer after its been free-d is undefined behavior (ie. crashes normally). Therefore a common idiom is to set a pointer to NULL after it has been deallocated to more easily catch erroneous deferences later on.
It depends on what you mean. A null is tradionally means no value.
In C generally null mean 0. Therefore a pointer points to address 0. However if you actually have a a piece of memory then there coulkd be anything the memory from whatever used it last. If you clear the memory to (say) 0's then if you say that that memory contains pointers, those pointers will be null.
You have to think about what a pointer is: Its simply a value that holds an address of some memory. So if the address is 0x1 then this pointer with the value is pointing at the second byte in memory (remember addressing traditionally is 0 for first item, 1 for second etc). So if I char * p = 0x1; is say p points to the memory starting at address 0. Since I have declared it as char *, I've saying that I'm interested in a char sized value in the memory pointed at by 0. So *p is the value in the second byte in memory.
for example:
take the following
struct somestruct { char p } ;
// this means that I've got somestruct at location null (0x00000)
somestruct* ptrToSomeStruct = null;
so the ptrToSomeStruct->p says take the contents of where ptrToSomeStruct points (0x00000) and then what ever is there take tat to be a value of a char, so you are read the the first byte in memory
now if I declare it like so:
// this means that I've got somestruct on the stack and there for it's got some memory behind it.
somestruct ptrToSomeStruct;
so the ptrToSomeStruct->p says take the contents of where ptrToSomeStruct points (somewhere on the stack) and then what ever is there take that to be a value of a char, so you are read the the some byte from the stack.
Reflecting the comments below:
One of the key problems faced by C (and simmilar laguages) programmers is that sometimes a pointer will be pointing to the wrong part of memory, so when you read the value then you gone to the wrong part of memory to start with, hence what you find there is wrong anyway. In lots of cases the wrong address is actually set to 0. This like my examples means go to the start of memory and read stuff there. To help with programming errors where you actually have 0 in a pointer, many operating systems/architectures prevent you from reading or writing that memory and when you do, your program gets a address exception/fault.
Typically, in classic pointers, a null pointer points to address 0x0. It depends on architecture and specific language, but if it is a primitive type then the value 0 would be considered NULL.
In Intel architectures, the beginning of memory (address 0) contains reserved space, which cannot be allocated. It is also outside the boundary of any running application. So a pointer pointing there would quite safely mean NULL as well.
Speaking in C:
The preprocesor macro NULL is #defined (by stdio.h or stddef.h), with value 0 (or (void *)0)
-1) If I make front=NULL. What will be the value which actually gets stored in the different fields of the front, say front->size ,front->start_addr. Is it 0 or something else. I have limited knowledge in this NULL thing.
You will have front = 0x0. Doing font->size will raise a SIGSEG.
-2) If I do a free(front); It frees the memory which is pointed out by front. So what exactly free means here, does it make it NULL or make it all 0.
Free will mark the memory once held in front as free, so another malloc/realloc call may use it. Whether it sets your pointer to NULL or leaves its value unchanged its implementation dependant, but surely wont set all the struct to 0.
-3) What can be a good strategy to deal with initialization of pointers and freeing them .
I like to initialize my pointers to NULL and set them to NULL after being deallocated.
Your question suggests that you don't understand pointers at all.
If you put front = NULL the compiler will do front = 0 and as front contained an address of actual structure then you'll lose a possibility to free it.
Read "Kernighan & Ritchie" one again.