std::queue::size() can return a huge number after pop() of size() == 0 - c++

I have the link here where I push(x) 10 ints, then pop() 11 and the size is not 0, or an exception, but a tremedous number (probably == std::numeric_limit<size_type>::max()). I assume this is the consequence of the internal representation simply doing a size-- and not checking for an already empty() case. This seems like a bug in the stdc++ library.
http://coliru.stacked-crooked.com/a/27ae7f10855e6c23

It's called "Undefined behaviour". To save time in the implementation (not checking if it's already empty), the implementation will simply decrement despite there being "nothing to decrement". Don't do that (and on another implementation it may not do that, so definitely don't rely on it doing anything meaningful). Since it's undefined, it may also dial out to Australia on your modem, erase your hard-disk or cause the application to crash. Or something else...

It is incorrect to pop an empty queue. Doing so means what happens is not defined by the standard, and most implemetations (in release/optimized builds) simply do broken things and do not check.
If you need a safe pop, try:
template<class Q>
void safe_pop(Q&q){
if (!q.empty())
q.pop();
}
and use safe_pop(que); instead of que.pop();.

Probably queue size data member gets decremented in pop() even if the queue is empty. Most likely the size data member has an unsigned integer type, when you decrement an unsigned zero it just wraps around to the biggest representable value.
EDIT: confirmed, 18446744073709551615 is 0xFFFFFFFFFFFFFFFF hexadecimal, which is the biggest value that can be represented by 8 bytes.

When you pop 1 more than push, internally you get size -1. However size_t is unsigned so -1 get typecasted to (unsigned max long - 1). However there is no intention to "fix" this issue because it will add an extra check in code and might cause performance issue. Unlike managed languages, C++ developers don't want anything extra that might cause even slight performance degradation :).
Here's related thread for GCC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55841

Related

Making unsigned integer underflow throw an exception

I understand that there are applications in which using unsigned integer over/underflow is a good way to get cheap modular arithmetic.
In my code, I use uint exclusively for indices to containers, so I never want this behaviour.
Is this a bad idea? Should I be using int everywhere instead? I do have to do some unsavoury things to get a for loop to count down to 0.
Is there a commonly used implementation of a less unsafe unsigned integer type? Something that throws an exception?
Do compilers (for me gcc, clang) provide a mechanism for less unsafe behaviour in the given compilation unit?
First, a terminology quibble: there is no such thing as unsigned integer underflow, precisely because of the way they wrap around (using modulo arithmetic), which is probably the phrase you meant.
Second, is this a common scenario to be in? Yes, it is a bit. You're not the only one doing "unsavoury things" with loops for reverse counting, and I bet there are a ton of bugs out there where people haven't done "unsavoury things" and, as a result, their code has an unsavoury infinite loop hidden in it. Mind you, I'm not sure I'd go so far as to call unsigneds "unsafe" as a result; like anything, they are the right tool for a subset of infinite possible jobs, and within that subset they perfectly safe.
There is debate over whether unsigned integers should be used for array indexes at all. Some standard committee members believe that their use in the standard library was a mistake; I know that several members of the c++ community here on Stack Overflow also hate unsigned values and wish they'd go away.
Personally I think having access to the full range of the integer by default is absolutely crucial (and losing that is not worth it for a single "-1" sentinel value or whatever), so I think that — while you're not alone in this requirement, and it's a sensible requirement — using unsigned array indexes by default is a good thing. (And what the heck is a negative array index? Semantics, people!)
But that doesn't help you in this scenario. So, what can you do about it? No, there's no trapping unsigned integer implementation (at least, not one that I'm aware of, let alone widespread) because that would literally violate the rules of the type as defined by C++: it would introduce well-defined underflow/overflow semantics to a type for which underflow/overflow shouldn't even be possible.
You will have to use signed integers and check for "logical underflow" (i.e. going out of your desired range, say -1) yourself. You could wrap this behaviour in a class.
I suppose you could actually just wrap an unsigned integer while you're at it, adding some extra logic to operator-- and operator-= to detect a wrap-around and throw.
But I guess my point is that, whatever you do, it's going to be in your "code space" and thus subject to decreased performance. You can't eke out this behaviour from the platform itself.

Why QVector::size returns int?

std::vector::size() returns a size_type which is unsigned and usually the same as size_t, e.g. it is 8 bytes on 64bit platforms.
In constrast, QVector::size() returns an int which is usually 4 bytes even on 64bit platforms, and at that it is signed, which means it can only go half way to 2^32.
Why is that? This seems quite illogical and also technically limiting, and while it is nor very likely that you may ever need more than 2^32 number of elements, the usage of signed int cuts that range in half for no apparent good reason. Perhaps to avoid compiler warnings for people too lazy to declare i as a uint rather than an int who decided that making all containers return a size type that makes no sense is a better solution? The reason could not possibly be that dumb?
This has been discussed several times since Qt 3 at least and the QtCore maintainer expressed that a while ago no change would happen until Qt 7 if it ever does.
When the discussion was going on back then, I thought that someone would bring it up on Stack Overflow sooner or later... and probably on several other forums and Q/A, too. Let us try to demystify the situation.
In general you need to understand that there is no better or worse here as QVector is not a replacement for std::vector. The latter does not do any Copy-On-Write (COW) and that comes with a price. It is meant for a different use case, basically. It is mostly used inside Qt applications and the framework itself, initially for QWidgets in the early times.
size_t has its own issue, too, after all that I will indicate below.
Without me interpreting the maintainer to you, I will just quote Thiago directly to carry the message of the official stance on:
For two reasons:
1) it's signed because we need negative values in several places in the API:
indexOf() returns -1 to indicate a value not found; many of the "from"
parameters can take negative values to indicate counting from the end. So even
if we used 64-bit integers, we'd need the signed version of it. That's the
POSIX ssize_t or the Qt qintptr.
This also avoids sign-change warnings when you implicitly convert unsigneds to
signed:
-1 + size_t_variable => warning
size_t_variable - 1 => no warning
2) it's simply "int" to avoid conversion warnings or ugly code related to the
use of integers larger than int.
io/qfilesystemiterator_unix.cpp
size_t maxPathName = ::pathconf(nativePath.constData(), _PC_NAME_MAX);
if (maxPathName == size_t(-1))
io/qfsfileengine.cpp
if (len < 0 || len != qint64(size_t(len))) {
io/qiodevice.cpp
qint64 QIODevice::bytesToWrite() const
{
return qint64(0);
}
return readSoFar ? readSoFar : qint64(-1);
That was one email from Thiago and then there is another where you can find some detailed answer:
Even today, software that has a core memory of more than 4 GB (or even 2 GB)
is an exception, rather than the rule. Please be careful when looking at the
memory sizes of some process tools, since they do not represent actual memory
usage.
In any case, we're talking here about having one single container addressing
more than 2 GB of memory. Because of the implicitly shared & copy-on-write
nature of the Qt containers, that will probably be highly inefficient. You need
to be very careful when writing such code to avoid triggering COW and thus
doubling or worse your memory usage. Also, the Qt containers do not handle OOM
situations, so if you're anywhere close to your memory limit, Qt containers
are the wrong tool to use.
The largest process I have on my system is qtcreator and it's also the only
one that crosses the 4 GB mark in VSZ (4791 MB). You could argue that it is an
indication that 64-bit containers are required, but you'd be wrong:
Qt Creator does not have any container requiring 64-bit sizes, it simply
needs 64-bit pointers
It is not using 4 GB of memory. That's just VSZ (mapped memory). The total
RAM currently accessible to Creator is merely 348.7 MB.
And it is using more than 4 GB of virtual space because it is a 64-bit
application. The cause-and-effect relationship is the opposite of what you'd
expect. As a proof of this, I checked how much virtual space is consumed by
padding: 800 MB. A 32-bit application would never do that, that's 19.5% of the
addressable space on 4 GB.
(padding is virtual space allocated but not backed by anything; it's only
there so that something else doesn't get mapped to those pages)
Going into this topic even further with Thiago's responses, see this:
Personally, I'm VERY happy that Qt collection sizes are signed. It seems
nuts to me that an integer value potentially used in an expression using
subtraction be unsigned (e.g. size_t).
An integer being unsigned doesn't guarantee that an expression involving
that integer will never be negative. It only guarantees that the result
will be an absolute disaster.
On the other hand, the C and C++ standards define the behaviour of unsigned
overflows and underflows.
Signed integers do not overflow or underflow. I mean, they do because the types
and CPU registers have a limited number of bits, but the standards say they
don't. That means the compiler will always optimise assuming you don't over-
or underflow them.
Example:
for (int i = 1; i >= 1; ++i)
This is optimised to an infinite loop because signed integers do not overflow.
If you change it to unsigned, then the compiler knows that it might overflow
and come back to zero.
Some people didn't like that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475
unsigned numbers are values mod 2^n for some n.
Signed numbers are bounded integers.
Using unsigned values as approximations for 'positive integers' runs into the problem that common values are near the edge of the domain where unsigned values behave differently than plain integers.
The advantage is that unsigned approximation reaches higher positive integers, and under/overflow are well defined (if random when looked at as a model of Z).
But really, ptrdiff_t would be better than int.

Initialize a variable

Is it better to declare and initialize the variable or just declare it?
What's the best and the most efficient way?
For example, I have this code:
#include <stdio.h>
int main()
{
int number = 0;
printf("Enter with a number: ");
scanf("%d", &number);
if(number < 0)
number= -number;
printf("The modulo is: %d\n", number);
return 0;
}
If I don't initialize number, the code works fine, but I want to know, is it faster, better, more efficient? Is it good to initialize the variable?
scanf can fail, in which case nothing is written to number. So if you want your code to be correct you need to initialize it (or check the return value of scanf).
The speed of incorrect code is usually irrelevant, but for you example code if there is a difference in speed at all then I doubt you would ever be able to measure it. Setting an int to 0 is much faster than I/O.
Don't attribute speed to language; That attribute belongs to implementations of language. There are fast implementations and slow implementations. There are optimisations assosciated with fast implementations; A compiler that produces well-optimised machine code would optimise the initialisation away if it can deduce that it doesn't need the initialisation.
In this case, it actually does need the initialisation. Consider if scanf were to fail. When scanf fails, it's return value reflects this failure. It'll either return:
A value less than zero if there was a read error or EOF (which can be triggered in an implementation-defined way, typically CTRL+Z on Windows and CTRL+d on Linux),
A number less than the number of objects provided to scanf (since you've provided only one object, this failure return value would be 0) when a conversion failure occurs (for example, entering 'a' on stdin when you've told scanf to convert sequences of '0'..'9' into an integer),
The number of objects scanf managed to assign to. This is 1, in your case.
Since you aren't checking for any of these return values (particular #3), your compiler can't deduce that the initialisation is necessary and hence, can't optimise it away. When the variable is uninitialised, failure to check these return values results in undefined behaviour. A chicken might appear to be living, even when it is missing its head. It would be best to check the return value of scanf. That way, when your variable is uninitialised you can avoid using an uninitialised value, and when it isn't your compiler can optimise away the initialisations, presuming you handle erroneous return values by producing error messages rather than using the variable.
edit: On that topic of undefined behaviour, consider what happens in this code:
if(number < 0)
number= -number;
If number is -32768, and INT_MAX is 32767, then section 6.5, paragraph 5 of the C standard applies because -(-32768) isn't representable as an int.
Section 6.5, paragraph 5 says:
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
Suppose if you don't initialize a variable and your code is buggy.(e.g. you forgot to read number). Then uninitialized value of number is garbage and different run will output(or behave) different results.
But If you initialize all of your variables then it will produce constant result. An easy to trace error.
Yes, initialize steps will add extra steps in your code at low level. for example mov $0, 28(%esp) in your code at low level. But its one time task. doesn't kill your code efficiency.
So, always using initialization is a good practice!
With modern compilers, there isn't going to be any difference in efficiency. Coding style is the main consideration. In general, your code is more self-explanatory and less likely to have mistakes if you initialize all variables upon declaring them. In the case you gave, though, since the variable is effectively initialized by the scanf, I'd consider it better not to have a redundant initialization.
Before, you need to answer to this questions:
1) how many time is called this function? if you call 10.000.000 times, so, it's a good idea to have the best.
2) If I don't inizialize my variable, I'm sure that my code is safe and not throw any exception?
After, an int inizialization doesn't change so much in your code, but a string inizialization yes.
Be sure that you do all the controls, because if you have a not-inizialized variable your program is potentially buggy.
I can't tell you how many times I've seen simple errors because a programmer doesn't initialize a variable. Just two days ago there was another question on SO where the end result of the issue being faced was simply that the OP didn't initialize a variable and thus there were problems.
When you talk about "speed" and "efficiency" don't simply consider how much faster the code might compile or run (and in this case it's pretty much irrelevant anyway) but consider your debugging time when there's a simple mistake in the code do to the fact you didn't initialize a variable that very easily could have been.
Note also, my experience is when coding for larger corporations they will run your code through tools like coverity or klocwork which will ding you for uninitialized variables because they present a security risk.

(How) do you handle possible integer overflows in C++ code?

Every now and then, especially when doing 64bit builds of some code base, I notice that there are plenty of cases where integer overflows are possible. The most common case is that I do something like this:
// Creates a QPixmap out of some block of data; this function comes from library A
QPixmap createFromData( const char *data, unsigned int len );
const std::vector<char> buf = createScreenShot();
return createFromData( &buf[0], buf.size() ); // <-- warning here in 64bit builds
The thing is that std::vector::size() nicely returns a size_t (which is 8 bytes in 64bit builds) but the function happens to take an unsigned int (which is still only 4 bytes in 64bit builds). So the compiler warns correctly.
If possible, I try to fix up the signatures to use the correct types in the first place. However, I'm often hitting this problem when combining functions from different libraries which I cannot modify. Unfortunately, I often resort to some reasoning along the lines of "Okay, nobody will ever do a screenshot generating more than 4GB of data, so why bother" and just change the code to do
return createFromData( &buf[0], static_cast<unsigned int>( buf.size() ) );
So that the compiler shuts up. However, this feels really evil. So I've been considering to have some sort of runtime assertion which at least yields a nice error in the debug builds, as in:
assert( buf.size() < std::numeric_limits<unsigned int>::maximum() );
This is a bit nicer already, but I wonder: how do you deal with this sort of problem, that is: integer overflows which are "almost" impossible (in practice). I guess that means that they don't occur for you, they don't occur for QA - but they explode in the face of the customer.
If you can't fix the types (because you can't break library compatibility), and you're "confident" that the size will never get that big, you can use boost::numeric_cast in place of the static_cast. This will throw an exception if the value is too big.
Of course the surrounding code then has to do something vaguely sensible with the exception - since it's a "not expected ever to happen" condition, that might just mean shutting down cleanly. Still better than continuing with the wrong size.
The solution depends on context. In some cases, you know where the data
comes from, and can exclude overflow: an int that is initialized with
0 and incremented once a second, for example, isn't going to overflow
anytime in the lifetime of the machine. In such cases, you just convert
(or allow the implicit conversion to do its stuff), and don't worry
about it.
Another type of case is fairly similar: in your case, for example, it's
probably not reasonable for a screen schot to have more data that can be
represented by an int, so the conversion is also safe. Provided the
data really did come from a screen shot; in such cases, the usual
procedure is to validate the data on input, ensuring that it fulfills
your constraints downstream, and then do no further validation.
Finally, if you have no real control over where the data is coming from,
and can't validate on entry (at least not for your constraints
downstream), you're stuck with using some sort of checking conversion,
validating immediately at the point of conversion.
If you push a 64-bit overflowing number into a 32-bit library you open pandora's box -- undefined behaviour.
Throw an exception. Since exceptions can in general spring up arbitrarily anywhere you should have suitable code to catch it anyway. Given that, you may as well exploit it.
Error messages are unpleasant but they're better than undefined behaviour.
Such scenarios can be held in one of four ways or using a combination of them:
use right types
use static assertions
use runtime assertions
ignore until hurts
Usually the best is to use right types right until your code gets ugly and then roll in static assertions. Static assertions are much better than runtime assertions for this very purpose.
Finally, when static assertions won't work (like in your example) you use runtime assertions - yes, they get into customers' faces, but at least your program behaves predictably. Yes, customers don't like assertions - they start panic ("we have error!" in all caps), but without an assertion the program would likely misbehave and no way to easily diagnose the problem would be.
One thing just came to my mind: since I need some sort of runtime check (whether or not the value of e.g. buf.size() exceeds the range of unsigned int can only be tested at runtime), but I do not want to have a million assert() invocations everywhere, I could do something like
template <typename T, typename U>
T integer_cast( U v ) {
assert( v < std::numeric_limits<T>::maximum() );
return static_cast<T>( v );
}
That way, I would at least have the assertion centralized, and
return createFromData( &buf[0], integer_cast<unsigned int>( buf.size() ) );
Is a tiny bit better. Maybe I should rather throw an exception (it is quite exceptional indeed!) instead of assert'ing, to give the caller a chance to handle the situation gracefully by rolling back previous work and issueing diagnostic output or the like.

Why is address zero used for the null pointer?

In C (or C++ for that matter), pointers are special if they have the value zero: I am adviced to set pointers to zero after freeing their memory, because it means freeing the pointer again isn't dangerous; when I call malloc it returns a pointer with the value zero if it can't get me memory; I use if (p != 0) all the time to make sure passed pointers are valid, etc.
But since memory addressing starts at 0, isn't 0 just as a valid address as any other? How can 0 be used for handling null pointers if that is the case? Why isn't a negative number null instead?
Edit:
A bunch of good answers. I'll summarize what has been said in the answers expressed as my own mind interprets it and hope that the community will correct me if I misunderstand.
Like everything else in programming it's an abstraction. Just a constant, not really related to the address 0. C++0x emphasizes this by adding the keyword nullptr.
It's not even an address abstraction, it's the constant specified by the C standard and the compiler can translate it to some other number as long as it makes sure it never equals a "real" address, and equals other null pointers if 0 is not the best value to use for the platform.
In case it's not an abstraction, which was the case in the early days, the address 0 is used by the system and off limits to the programmer.
My negative number suggestion was a little wild brainstorming, I admit. Using a signed integer for addresses is a little wasteful if it means that apart from the null pointer (-1 or whatever) the value space is split evenly between positive integers that make valid addresses and negative numbers that are just wasted.
If any number is always representable by a datatype, it's 0. (Probably 1 is too. I think of the one-bit integer which would be 0 or 1 if unsigned, or just the signed bit if signed, or the two bit integer which would be [-2, 1]. But then you could just go for 0 being null and 1 being the only accessible byte in memory.)
Still there is something that is unresolved in my mind. The Stack Overflow question Pointer to a specific fixed address tells me that even if 0 for null pointer is an abstraction, other pointer values aren't necessarily. This leads me to post another Stack Overflow question, Could I ever want to access the address zero?.
2 points:
only the constant value 0 in the source code is the null pointer - the compiler implementation can use whatever value it wants or needs in the running code. Some platforms have a special pointer value that's 'invalid' that the implementation might use as the null pointer. The C FAQ has a question, "Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types?", that points out several platforms that used this property of 0 being the null pointer in C source while represented differently at runtime. The C++ standard has a note that makes clear that converting "an integral constant expression with value zero always yields a null pointer, but converting other expressions that happen to have value zero need not yield a null pointer".
a negative value might be just as usable by the platform as an address - the C standard simply had to chose something to use to indicate a null pointer, and zero was chosen. I'm honestly not sure if other sentinel values were considered.
The only requirements for a null pointer are:
it's guaranteed to compare unequal to a pointer to an actual object
any two null pointers will compare equal (C++ refines this such that this only needs to hold for pointers to the same type)
Historically, the address space starting at 0 was always ROM, used for some operating system or low level interrupt handling routines, nowadays, since everything is virtual (including address space), the operating system can map any allocation to any address, so it can specifically NOT allocate anything at address 0.
IIRC, the "null pointer" value isn't guaranteed to be zero. The compiler translates 0 into whatever "null" value is appropriate for the system (which in practice is probably always zero, but not necessarily). The same translation is applied whenever you compare a pointer against zero. Because you can only compare pointers against each other and against this special-value-0, it insulates the programmer from knowing anything about the memory representation of the system. As for why they chose 0 instead of 42 or somesuch, I'm going to guess it's because most programmers start counting at 0 :) (Also, on most systems 0 is the first memory address and they wanted it to be convenient, since in practice translations like I'm describing rarely actually take place; the language just allows for them).
You must be misunderstanding the meaning of constant zero in pointer context.
Neither in C nor in C++ pointers can "have value zero". Pointers are not arithmetic objects. They canot have numerical values like "zero" or "negative" or anything of that nature. So your statement about "pointers ... have the value zero" simply makes no sense.
In C & C++ pointers can have the reserved null-pointer value. The actual representation of null-pointer value has nothing to do with any "zeros". It can be absolutely anything appropriate for a given platform. It is true that on most plaforms null-pointer value is represented physically by an actual zero address value. However, if on some platform address 0 is actually used for some purpose (i.e. you might need to create objects at address 0), the null-pointer value on such platform will most likely be different. It could be physically represented as 0xFFFFFFFF address value or as 0xBAADBAAD address value, for example.
Nevertheless, regardless of how the null-pointer value is respresented on a given platform, in your code you will still continue to designate null-pointers by constant 0. In order to assign a null-pointer value to a given pointer, you will continue to use expressions like p = 0. It is the compiler's responsibility to realize what you want and translate it into the proper null-pointer value representation, i.e. to translate it into the code that will put the address value of 0xFFFFFFFF into the pointer p, for example.
In short, the fact that you use 0 in your sorce code to generate null-pointer values does not mean that the null-pointer value is somehow tied to address 0. The 0 that you use in your source code is just "syntactic sugar" that has absolutely no relation to the actual physical address the null-pointer value is "pointing" to.
But since memory addressing starts at 0, isn't 0 just as a valid address as any other?
On some/many/all operating systems, memory address 0 is special in some way. For example, it's often mapped to invalid/non-existent memory, which causes an exception if you try to access it.
Why isn't a negative number null instead?
I think that pointer values are typically treated as unsigned numbers: otherwise for example a 32-bit pointer would only be able to address 2 GB of memory, instead of 4 GB.
My guess would be that the magic value 0 was picked to define an invalid pointer since it could be tested for with less instructions. Some machine languages automatically set the zero and sign flags according to the data when loading registers so you could test for a null pointer with a simple load then and branch instructions without doing a separate compare instruction.
(Most ISAs only set flags on ALU instructions, not loads, though. And usually you aren't producing pointers via computation, except in the compiler when parsing C source. But at least you don't need an arbitrary pointer-width constant to compare against.)
On the Commodore Pet, Vic20, and C64 which were the first machines I worked on, RAM started at location 0 so it was totally valid to read and write using a null pointer if you really wanted to.
I think it's just a convention. There must be some value to mark an invalid pointer.
You just lose one byte of address space, that should rarely be a problem.
There are no negative pointers. Pointers are always unsigned. Also if they could be negative your convention would mean that you lose half the address space.
Although C uses 0 to represent the null pointer, do keep in mind that the value of the pointer itself may not be a zero. However, most programmers will only ever use systems where the null pointer is, in fact, 0.
But why zero? Well, it's one address that every system shares. And oftentimes the low addresses are reserved for operating system purposes thus the value works well as being off-limits to application programs. Accidental assignment of an integer value to a pointer is as likely to end up zero as anything else.
Historically the low memory of an application was occupied by system resources. It was in those days that zero became the default null value.
While this is not necessarily true for modern systems, it is still a bad idea to set pointer values to anything but what memory allocation has handed you.
Regarding the argument about not setting a pointer to null after deleting it so that future deletes "expose errors"...
If you're really, really worried about this then a better approach, one that is guaranteed to work, is to leverage assert():
...
assert(ptr && "You're deleting this pointer twice, look for a bug?");
delete ptr;
ptr = 0;
...
This requires some extra typing, and one extra check during debug builds, but it is certain to give you what you want: notice when ptr is deleted 'twice'. The alternative given in the comment discussion, not setting the pointer to null so you'll get a crash, is simply not guaranteed to be successful. Worse, unlike the above, it can cause a crash (or much worse!) on a user if one of these "bugs" gets through to the shelf. Finally, this version lets you continue to run the program to see what actually happens.
I realize this does not answer the question asked, but I was worried that someone reading the comments might come to the conclusion that it is considered 'good practice' to NOT set pointers to 0 if it is possible they get sent to free() or delete twice. In those few cases when it is possible it is NEVER a good practice to use Undefined Behavior as a debugging tool. Nobody that's ever had to hunt down a bug that was ultimately caused by deleting an invalid pointer would propose this. These kinds of errors take hours to hunt down and nearly alway effect the program in a totally unexpected way that is hard to impossible to track back to the original problem.
An important reason why many operating systems use all-bits-zero for the null pointer representation, is that this means memset(struct_with_pointers, 0, sizeof struct_with_pointers) and similar will set all of the pointers inside struct_with_pointers to null pointers. This is not guaranteed by the C standard, but many, many programs assume it.
In one of the old DEC machines (PDP-8, I think), the C runtime would memory protect the first page of memory so that any attempt to access memory in that block would cause an exception to be raised.
The choice of sentinel value is arbitrary, and this is in fact being addressed by the next version of C++ (informally known as "C++0x", most likely to be known in the future as ISO C++ 2011) with the introduction of the keyword nullptr to represent a null valued pointer. In C++, a value of 0 may be used as an initializing expression for any POD and for any object with a default constructor, and it has the special meaning of assigning the sentinel value in the case of a pointer initialization. As for why a negative value was not chosen, addresses usually range from 0 to 2N-1 for some value N. In other words, addresses are usually treated as unsigned values. If the maximum value were used as the sentinel value, then it would have to vary from system to system depending on the size of memory whereas 0 is always a representable address. It is also used for historical reasons, as memory address 0 was typically unusable in programs, and nowadays most OSs have parts of the kernel loaded into the lower page(s) of memory, and such pages are typically protected in such a way that if touched (dereferenced) by a program (save the kernel) will cause a fault.
It has to have some value. Obviously you don't want to step on values the user might legitimately want to use. I would speculate that since the C runtime provides the BSS segment for zero-initialized data, it makes a certain degree of sense to interpret zero as an un-initialized pointer value.
Rarely does an OS allow you to write to address 0. It's common to stick OS-specific stuff down in low memory; namely, IDTs, page tables, etc. (The tables have to be in RAM, and it's easier to stick them at the bottom than to try and determine where the top of RAM is.) And no OS in its right mind will let you edit system tables willy-nilly.
This may not have been on K&R's minds when they made C, but it (along with the fact that 0==null is pretty easy to remember) makes 0 a popular choice.
The value 0 is a special value that takes on various meanings in specific expressions. In the case of pointers, as has been pointed out many many times, it is used probably because at the time it was the most convenient way of saying "insert the default sentinel value here." As a constant expression, it does not have the same meaning as bitwise zero (i.e., all bits set to zero) in the context of a pointer expression. In C++, there are several types that do not have a bitwise zero representation of NULL such as pointer member and pointer to member function.
Thankfully, C++0x has a new keyword for "expression that means a known invalid pointer that does not also map to bitwise zero for integral expressions": nullptr. Although there are a few systems that you can target with C++ that allow dereferencing of address 0 without barfing, so programmer beware.
There are already a lot of good answers in this thread; there are probably many different reasons for preferring the value 0 for null pointers, but I'm going to add two more:
In C++, zero-initializing a pointer will set it to null.
On many processors it is more efficient to set a value to 0 or to test for it equal/not equal to 0 than for any other constant.
This is dependent on the implementation of pointers in C/C++. There is no specific reason why NULL is equivalent in assignments to a pointer.
Null pointer is not the same thing with null value. For example the same strchr function of c will return a null pointer (empty on the console), while passing the value would return (null) on the console.
True function:
char *ft_strchr(const char *s, int c)
{
int i;
if (!s)
return (NULL);
i = 0;
while (s[i])
{
if (s[i] == (char)c)
return ((char*)(s + i));
i++;
}
**if (s[i] == (char)c)
return ((char*)(s + i));**
return (NULL);
}
This will produce empty thing as the output: the last || is the output.
While passing as value like s[i] gives us a NULL like: enter image description here
char *ft_strchr(const char *s, int c)
{
int i;
if (!s)
return (NULL);
i = 0;
while (s[i])
{
if (s[i] == (char)c)
return ((char*)(s + i));
i++;
}
**if (s[i] == (char)c)
return (s[i]);**
return (NULL);
}
There are historic reasons for this, but there are also optimization reasons for it.
It is common for the OS to provide a process with memory pages initialized to 0. If a program wants to interpret part of that memory page as a pointer then it is 0, so it is easy enough for the program to determine that that pointer is not initialized. (this doesn't work so well when applied to uninitialized flash pages)
Another reason is that on many many processors it is very very easy to test a value's equivalence to 0. It is sometimes a free comparison done without any extra instructions needed, and usually can be done without needing to provide a zero value in another register or as a literal in the instruction stream to compare to.
The cheap comparisons for most processors are the signed less than 0, and equal to 0. (signed greater than 0 and not equal to 0 are implied by both of these)
Since 1 value out of all of possible values needs to be reserved as bad or uninitialized then you might as well make it the one that has the cheapest test for equivalence to the bad value. This is also true for '\0' terminated character strings.
If you were to try to use greater or less than 0 for this purpose then you would end up chopping your range of addresses in half.
The constant 0 is used instead of NULL because C was made by some cavemen trillions of years ago, NULL, NIL, ZIP, or NADDA would have all made much more sense than 0.
But since memory addressing starts at
0, isn't 0 just as a valid address as
any other?
Indeed. Although a lot of operating systems disallow you from mapping anything at address zero, even in a virtual address space (people realized C is an insecure language, and reflecting that null pointer dereference bugs are very common, decided to "fix" them by dissallowing the userspace code to map to page 0; Thus, if you call a callback but the callback pointer is NULL, you wont end up executing some arbitrary code).
How can 0 be used for handling null
pointers if that is the case?
Because 0 used in comparison to a pointer will be replaced with some implementation specific value, which is the return value of malloc on a malloc failure.
Why isn't a negative number null
instead?
This would be even more confusing.