Related
If I come across old code that does if (!this) return; in an app, how severe a risk is this? Is it a dangerous ticking time bomb that requires an immediate app-wide search and destroy effort, or is it more like a code smell that can be quietly left in place?
I am not planning on writing code that does this, of course. Rather, I've recently discovered something in an old core library used by many pieces of our app.
Imagine a CLookupThingy class has a non-virtual CThingy *CLookupThingy::Lookup( name ) member function. Apparently one of the programmers back in those cowboy days encountered many crashes where NULL CLookupThingy *s were being passed from functions, and rather than fixing hundreds of call sites, he quietly fixed up Lookup():
CThingy *CLookupThingy::Lookup( name )
{
if (!this)
{
return NULL;
}
// else do the lookup code...
}
// now the above can be used like
CLookupThingy *GetLookup()
{
if (notReady()) return NULL;
// else etc...
}
CThingy *pFoo = GetLookup()->Lookup( "foo" ); // will set pFoo to NULL without crashing
I discovered this gem earlier this week, but now am conflicted as to whether I ought to fix it. This is in a core library used by all of our apps. Several of those apps have already been shipped to millions of customers, and it seems to be working fine; there are no crashes or other bugs from that code. Removing the if !this in the lookup function will mean fixing thousands of call sites that potentially pass NULL; inevitably some will be missed, introducing new bugs that will pop up randomly over the next year of development.
So I'm inclined to leave it alone, unless absolutely necessary.
Given that it is technically undefined behavior, how dangerous is if (!this) in practice? Is it worth man-weeks of labor to fix, or can MSVC and GCC be counted on to safely return?
Our app compiles on MSVC and GCC, and runs on Windows, Ubuntu, and MacOS. Portability to other platforms is irrelevant. The function in question is guaranteed to never be virtual.
Edit: The kind of objective answer I am looking for is something like
"Current versions of MSVC and GCC use an ABI where nonvirtual members are really statics with an implicit 'this' parameter; therefore they will safely branch into the function even if 'this' is NULL" or
"a forthcoming version of GCC will change the ABI so that even nonvirtual functions require loading a branch target from the class pointer" or
"the current GCC 4.5 has an inconsistent ABI where sometimes it compiles nonvirtual members as direct branches with an implicit parameter, and sometimes as class-offset function pointers."
The former means the code is stinky but unlikely to break; the second is something to test after a compiler upgrade; the latter requires immediate action even at high cost.
Clearly this is a latent bug waiting to happen, but right now I'm only concerned with mitigating risk on our specific compilers.
I would leave it alone. This might have been a deliberate choice as an old-fashioned version of the SafeNavigationOperator. As you say, in new code, I wouldn't recommend it, but for existing code, I'd leave it alone. If you do end up modifying it, I'd make sure that all calls to it are well-covered by tests.
Edit to add: you could choose to remove it only in debug versions of your code via:
CThingy *CLookupThingy::Lookup( name )
{
#if !defined(DEBUG)
if (!this)
{
return NULL;
}
#endif
// else do the lookup code...
}
Thus, it wouldn't break anything on production code, while giving you a chance to test it in debug mode.
Like all undefined behavior
if (!this)
{
return NULL;
}
this is a bomb waiting to go off. If it works with your current compilers, you are kind-of lucky, kind-of unlucky!
The next release of the same compilers might be more aggressive and see this as dead code. As this can never be null, the code can "safely" be removed.
I think it is better if you removed it!
If you have many GetLookup functions return NULL, then you're better off fixing code that calls methods using a NULL pointer. First, replace
if (!this) return NULL;
with
if (!this) {
// TODO(Crashworks): Replace this case with an assertion on July, 2012, once all callers are fixed.
printf("Please mail the following stack trace to myemailaddress. Thanks!");
print_stacktrace();
return NULL;
}
Now, carry on with your other work, but fix these as they roll in. Replace:
GetLookup(x)->Lookup(y)...
with
convert_to_proxy(GetLookup(x))->Lookup(y)...
Where conver_to_proxy does returns the pointer unchanged, unless it's NULL, in which case it returns a FailedLookupObject as in my other answer.
It may not crash in most compilers since non-virtual functions are typically either inlined or translated into non-member functions taking "this" as a parameter. However, the standard specifically says that calling a non-static member function outside the lifetime of the object is undefined, and the lifetime of an object is defined as beginning when memory for the object has been allocated and the constructor has completed, if it has non-trivial initialization.
The standard only makes an exception to this rule for calls made by the object itself during construction or destruction, but even then one must be careful because the behavior of virtual calls can differ from the behavior during the object's lifetime.
TL:DR: I'd kill it with fire, even if it will take a long time to clean up all the call sites.
Future versions of the compiler are likely to more aggressively optimize in cases of formally undefined behavior. I wouldn't worry about existing deployments (where you know the behavior the compiler actually implemented), but it should be fixed in the source code in case you ever use a different compiler or different version.
this is something that's called 'a smart and ugly hack'. note: smart != wise.
finding all the call sites without any refactoring tools should be easy enough; break GetLookup() somehow so it doesn't compile (e.g. change signature) so you can identify misusage statically. then add a function called DoLookup() which does what all this hacks are doing right now.
In this case I'd suggest removing the NULL check from the member function and create a non-member function
CThingy* SafeLookup(CLookupThing *lookupThing) {
if (lookupThing == NULL) {
return NULL;
} else {
return lookupThing->Lookup();
}
}
Then it should be easy enough to find every call to the Lookup member function and replace it with the safe non-member function.
If it's something that's bothering you today, it'll bother you a year from now. As you pointed out, changing it will almost certainly introduce some bugs -- but you can begin by retaining the return NULL functionality, add a bit of logging, let it run in the wild for a few weeks, and find how many times it even gets hit?
This is a "ticking bomb" only if you are pedantic about the wording of the specification. However, regardless, it is a terrible, ill-advised approach because it obscures a program error. For that reason alone, I would remove it, even if it means considerable work. It is not an immediate (or even middle-term) risk, but it just isn't a good approach.
Such error hiding behavior really isn't something you want to rely on, either. Imagine you rely on this behavior (i.e. it doesn't matter whether my objects are valid, it will work anyway!) and then, by some hazard, the compiler optimizes out the if statement in a particular case because it can prove that this is not a null pointer. That is a legitimate optimization not just for some hypothetical future compiler, but for very real, present-time compilers as well.
But of course, since your program isn't well-formed, it happens that at some point you pass it a null this around 20 corners. Bang, you're dead.
That's very contrieved, admittedly, and it won't happen, but you cannot be 100% certain that it still cannot possibly happen.
Note that when I shout out "remove!", that does not mean the whole lot of them must be removed immediately or in one massive manpower operation. You could remove these checks one by one as you encounter them (when you change something in the same file anyway, avoid recompilations), or just text-search for one (preferrably in a highly used function), and remove that one.
Since you are using GCC, you may be intersted in __builtin_return_address, which may help you remove these checks without massive manpower and totally disrupting the whole workflow and rendering the application entirely unusable.
Before removing the check, modify it to to output the caller's address, and addr2line will tell you the location in your source. That way, you should be able to quickly identify all the locations in the application that are behaving wrongly, so you can fix these.
So instead of
if(!this) return 0;
change one location at a time to something like:
if(!this) { __builtin_printf("!!! %p\n", __builtin_return_address(0)); return 0; }
That lets you identify the invalid call sites for this change while still letting the program "work as intended" (you can also query the caller's caller if needed). Fix every ill-behaved location, one by one. The program will still "work" as normal.
Once no more addresses come up, remove the check alltogether. You might still have to fix one or the other crash if you are unlucky (because it didn't show while you tested), but that should be a very rare thing to happen. In any case, it should prevent your co-worker from shouting at you.
Remove one or two checks per week, and eventually none will be left. Meanwhile, life goes on and nobody notices what you're doing at all.
TL;DR
As for "current versions of GCC", you are fine for non-virtual functions, but of course nobody can tell what a future version might do. I deem it however highly unlikely that a future version will cause your code to break. Not few existing projects have this kind of check (I remember we had literally hundreds of them in Code::Blocks code completion at some time, don't ask me why!). Compiler makers probably don't want to make dozens/hundreds of major project maintainers unhappy on purpose, only to prove a point.
Also, consider the last paragraph ("from a logical point of view"). Even if this check will crash and burn with a future compiler, it will crash and burn anyway.
The if(!this) return; statement is somewhat useless insofar as this cannot ever be a null pointer in a well-formed program (it means you called a member function on a null pointer). This does not mean, of course, that it couldn't possibly happen. But in this case, the program should crash hard or abort with an assertion. Under no conditions should such a program continue silently.
On the other hand, it is perfectly possible to call a member function on an invalid object that happens to be not null. Checking whether this is the null pointer obviously doesn't catch that case, but it is the exact same UB. So, apart from hiding wrong behavior, this check also only detects one half of the problematic cases.
If you are going by the wording of the speficication, using this (which includes checking whether it's a null pointer) is undefined behavior. Insofar, strictly speaking, it is a "time bomb". However, I would not reasonably deem that a problem, both from a practical point of view and from a logical one.
From a practical point of view, it doesn't really matter whether you read a pointer that is not valid as long as you do not dereference it. Yes, strictly to the letter, this is not allowed. Yes, in theory someone might build a CPU which will check invalid pointers when you load them, and fault. Alas, this isn't the case, if you're being real.
From a logical point of view, assuming that the check will blow up, it still isn't going to happen. For this statement to be executed, the member function must be called, and (virtual or not, inlined or not) is using an invalid this, which it makes available inside the function body. If one illegitimate use of this blows up, the other will, too. Thus, the check is being obsoleted because the program already crashes earlier.
n.b.: This check is very similar to the "safe delete idiom" which sets a pointer to nullptr after deleting it (using a macro or a templated safe_delete function). Presumably, this is "safe" because it doesn't crash deleting the same pointer twice. Some people even add a redundant if(!ptr) delete ptr;.
As you know, operator delete is guaranteed to be a no-op on a null pointer. Which means no more and no less than by setting a pointer to the null pointer, you have successfully eliminated the only chance to detect double deletion (which is a program error that needs to be fixed!). It is not any "safer", but it instead hides incorrect program behavior. If you delete an object twice, the program should crash hard.
You should either leave a deleted pointer alone, or, if you insist on tampering, set it to a non-null invalid pointer (such as the address of a special "invalid" global variable, or a magic value like -1 if you will -- but you should not try to cheat and hide the crash when it is to occur).
You can safely fix this today by returning a failed lookup object.
class CLookupThingy: public Interface {
// ...
}
class CFailedLookupThingy: public Interface {
public:
CThingy* Lookup(string const& name) {
return NULL;
}
operator bool() const { return false; } // So that GetLookup() can be tested in a condition.
} failed_lookup;
Interface *GetLookup() {
if (notReady())
return &failed_lookup;
// else etc...
}
This code still works:
CThingy *pFoo = GetLookup()->Lookup( "foo" ); // will set pFoo to NULL without crashing
It's my personal opinion that you should fail as early as possible to alert you to problems. In that case, I'd unceremoniously remove each and every occurrence of if(!this) I could find.
The following crashes with a seg-V:
// my code
int* ipt;
int bool set = false;
void Set(int* i) {
ASSERT(i);
ipt = i;
set = true;
}
int Get() {
return set ? *ipt : 0;
}
// code that I don't control.
struct S { int I, int J; }
int main() {
S* ip = NULL;
// code that, as a bug, forgets to set ip...
Set(&ip->J);
// gobs of code
return Get();
}
This is because while i is not NULL it still isn't valid. The same problem can happen if the calling code takes the address of an array index operation from a NULL pointer.
One solution to this is to trim the low order bits:
void Set(int* i) {
ASSERT((reinterpret_cast<size_t>(i))>>10);
ipt = i;
set = true;
}
But how many bits should/can I get rid of?
Edit, I'm not worried about undefined behavior as I'll be aborting (but more cleanly than a seg-v) on that case anyway.
FWIW: this is a semi-hypothetical situation. The bug that caused me to think of this was fixed before I posted, but I've run into it before and am thinking of how to work with it in the future.
Things that can be assumed for the sake of argument:
If Set is called with something that will seg-v, that's a bug
Set may be called by code that isn't my job to fix. (E.g. I file a bug)
Set may be called by code I'm trying to fix. (E.g. I'm adding sanity checks as part of my debuggin work.)
Get my be called in a way that provide no information about where Set was called. (I.e. allowing Get to seg-v isn't an effective way to debug anything.)
The code needn't be portable or catch 100% of bad pointers. It need only work on my current system often enough to let me find where things are going wrong.
There is no portable way to test for any invalid pointer except NULL. Evaluating &ip[3] gives undefined behaviour, before you do anything with it; the only solution is to test for NULL before doing any arithmetic on the pointer.
If you don't need portability, and don't need to guarantee that you catch all errors, then on most mainstream platforms you could check whether the address is within the first page of memory; it's common to define NULL to be address zero, and to reserve the first page to trap most null pointer dereferences. On a POSIX platform, this would look something like
static size_t page_size = sysconf(_SC_PAGESIZE);
assert(reinterpret_cast<intptr_t>(i) >= page_size);
But this isn't a complete solution. The only real solution is to fix whatever is abusing null pointers in the first place.
You shouldn't be doing pointer arithmetic (including array indexing) off of a null pointer at all.
And you should use 0, not NULL in c++. NULL is a feature of c, still supported but not idiomatic in c++.
In regards to the BCS's many comments and the edit. That changes the question from the rather naive one on the surface to a much deeper one. But...it is not going to be easy---in a language as permissive as c++---to protect yourself against people doing stupid things before calling your code.
Trying to work around undefined behavior will always be very dependant on your platform, compiler, version,etc. if it is at all possible.
Common *nixes never map the first page of the address space precisely to catch null pointer access, thus you might get away with checking if the pointer value is between 0 and 4096 (Or whatever page size your system uses).
But don't do this, you can't guard against everything that can go wrong, focus instead on getting the code right. If somone passes you an invalid pointer, chances are there's something gravely wrong anyway that a pointer validation check can't fix.
Is there any way you can exert some influence to get that bad code corrected? There is no possible way this can turn out well. Legally, just creating an invalid pointer is undefined behavior.
If Set is always going to be passed a small offset from ip, and ip will always be initialized to NULL, you are probably going to be OK with what you are doing. Most modern systems do have the null pointer constant as all bits zero, and most will do the natural thing. There is of course absolutely no guarantee that it will work on any given system with any given compiler and any given compiler options, and changing any of those might cause it to fail.
Since any use of bad pointers can cause program failure, you should consider what happens when the code triggers a memory violation.
Also, I don't know what your ASSERT macro does, but assert, in most implementations, is only activated in debug mode. If you want to push this piece of junk into production, or run in optimized mode, you might want to make sure it will still fail more gently.
If you don't mind a really bad hack, you can force a memory access with volatile (n.b. volatile is evil). According to the GCC docs, volatile accesses must be ordered across sequence points, so you can do something like this:
int test = *(volatile int *)i;
*(volatile int *)i = test;
I don't think = is a sequence point, but the following might also work:
*(volatile int *)i = *(volatile int *)i;
I really wouldn't recommend trying to work around a bug in somebody else's code. If you're not running everything you write through a debugger while you're developing code no amount of checks are going to help you catch all the problems. Get them to fix their code.
If you're not using a debugger, get a decent crash handler that dumps the callstack for each thread and as much additional information regarding the program state as possible. Try to figure out what could be going wrong from that.
Regularly running your code through static analysis tools can also help here.
Remember, that it might not be someone forgetting to initialise a pointer, it could be someone else overwriting that pointer through a bad memory write from somewhere completely unrelated. There are tools which can help track down such things too.
Regarding the NULL Vs 0 debate, #define NULL 0 is better for a couple of reasons:
1) You can more easily see when you're dealing with a pointer.
2) Using NULL offers no less or more safety than using 0. So why not make your code more readable?
3) When C++11 is finally released #define NULL nullptr is a lot easier to change than all those zeros. (You could go the other way and #define nullptr 0 today I suppose, but that will probably cause problems in the future if you're developing cross platform code.)
And for the record, the C++ standard explicitly states that a null pointer constant is an rvalue integer type that evaluates to zero. So please let's not have any more nonsense about null pointers not having to equal zero.
One reason, among many, you cannot do this in a portable fashion is that NULL is not guaranteed to be 0. It is only specified that null pointers will compare equal to 0. You may write a 0 (or the preprocessor macro "NULL") in your code, but the compiler knows that this 0 is in a pointer context so it generates the appropriate code to compare it to a null pointer, whatever the actual implementation of a null pointer is. See here and here for more information on that. Reinterpreting a NULL pointer as an integral type may cause it to have a true value instead of false.
You'd have to consider your particular operating system and hardware architecture. If you're only interested in detecting pointers that are "close to null" then you could use ASSERT(i > pageSize), assuming that the first page is always write protected in your OS.
But ... the obvious question is: Why bother? The OS will detect the null in this case and SEGV as you pointed out, which is just as good as an ASSERT, isn't it?
Most people use pointers like this...
if ( p != NULL ) {
DoWhateverWithP();
}
However, if the pointer is null for whatever reason, the function won't be called.
My question is, could it possibly be more beneficial to just not check for NULL? Obviously on safety critical systems this isn't an option, but your program crashing in a blaze of glory is more obvious than a function not being called if the program can still run without it.
In relation to the first question, do you always check for NULL before you use pointers?
Secondly, consider you have a function that takes a pointer as an argument, and you use this function multiple times on multiple pointers throughout your program. Do you find it more beneficial to test for NULL in the function (the benefit being you don't have to test for NULL all over the place), or on the pointer before calling the function (the benefit being no overhead from calling the function)?
You are right in thinking that NULL pointers often result in immediate crashes, but do not forget that if you are indexing into a large array through a NULL pointer, you might indeed get a valid memory address if your index is high enough. And then, you'll get memory corruption or incorrect memory reads, which will be much harder to locate.
Whenever I can assume that calling a function with NULL is a bug, which should never happen in production code, I prefer using ASSERT guards in the function, which are only compiled into real code in a debug build, and not checking for NULL otherwise.
And from my point of view, generally, a function should check its arguments, not the caller. You should always assume that your callers might have been a bit sloppy about the checking, or that they might contain bugs...
Morality: check for NULL in the function being called, either through some if() statement that throws, or using some ASSERT construct (possibly with a clear message of why this happened). Also check for NULL in the callers, but only if the callers know that this condition might happen in a normal program execution, and act accordingly.
When it's acceptable for the program to just crash if a NULL pointer comes up, I'm partial to:
assert(p);
DoWhateverWithP();
This will only check the pointer in debug builds since defining NDEBUG usually #undefs assert() at the preprocessor level. It documents your assumption and assists with debugging but has zero performance impact on the released binary (though, to be fair, checking for a NULL pointer should have effectively zero impact on performance in the vast majority of circumstances).
As a side benefit, this is legal for C as well as C++ and, in the latter case, doesn't require exceptions to be enabled in your compiler/runtime.
Concerning your second question, I prefer to put the assertions at the beginning of the subroutine. Again, the beauty of assert() is the fact that there's really no 'overhead' to speak of. As such, there's nothing to weigh against the benefits of only requiring one assertion in the subroutine definition.
Of course, the caveat is that you never want to assert an expression with side-effects:
assert(p = malloc(1)); // NEVER DO THIS!
DoSomethingWithP(); // If NDEBUG was defined, malloc() was never called!
Don't make it a rule to just check for null and do nothing if you find it.
If the pointer is allowed to be null, then you have to think about what your code does in the case that it actually is null. Usually, just doing nothing is the wrong answer. With care it's possible to define APIs which work like that, but this requires more than just scattering a few NULL checks about the place.
So, if the pointer is allowed to be null, then you must check for null, and you must do whatever is appropriate.
If the pointer is not allowed be null, then it's perfectly reasonable to write code which invokes undefined behaviour if it is null. It's no different from writing string-handling routines which invoke undefined behaviour if the input is not NUL-terminated, or writing buffer-using routines which invoke undefined behaviour if the caller passes in the wrong value for the length, or writing a function that takes a file* parameter, and invokes undefined behaviour if the user passes in a file descriptor reinterpret_cast to file*. In C and C++, you simply have to be able to rely on what your caller tells you. Garbage in, garbage out.
However, you might like to write code which helps out your caller (who is probably you, after all) when the most likely kinds of garbage are passed in. Asserts and exceptions are good for this.
Taking up the analogy from Franci's comment on the question: most people do not look for cars when crossing a footpath, or before sitting down on their sofa. They could still be hit by a car. It happens. But it would generally be considered paranoid to spend any effort checking for cars in those circumstances, or for the instructions on a can of soup to say "first, check for cars in your kitchen. Then, heat the soup".
The same goes for your code. It's much easier to pass an invalid value to a function than it is to accidentally drive your car into someone's kitchen. But it's still the fault of the driver if they do so and hit someone, not a failure of the cook to exercise due care. You don't necessarily want cooks (or callees) to clutter up their recipes (code) with checks that ought to be redundant.
There are other ways to find problems, such as unit tests and debuggers. In any case it is much safer to create a car-free environment except where necessary (roads), than it is to drive cars willy-nilly all over the place and hope everybody can cope with them at all times. So, if you do check for null in cases where it isn't allowed, you shouldn't let this give people the idea that it is allowed after all.
[Edit - I literally just hit an example of a bug where checking for null would not find an invalid pointer. I'm going to use a map to hold some objects. I will be using pointers to those objects (to represent a graph), which is fine because map never relocates its contents. But I haven't defined an ordering for the objects yet (and it's going to be a bit tricky to do so). So, to get things moving and prove that some other code works, I used a vector and a linear search instead of a map. That's right, I didn't mean vector, I meant deque. So after the first time the vector resized, I wasn't passing null pointers into functions, but I was passing pointers to memory which had been freed.
I make dumb errors which pass invalid garbage approximately as often as I make dumb errors which pass null pointers invalidly. So regardless of whether I add checking for null, I still need to be able to diagnose problems where the program just crashes for reasons I can't check. Since this will also diagnose null pointer accesses, I usually don't bother checking for null unless I'm writing code to generally check the preconditions on entry to the function. In that case it should if possible do a lot more than just check null.]
I prefer this style:
if (p == NULL) {
// throw some exception here
}
DoWhateverWithP();
This means that whatever function this code lives in will fail quickly in the event that p is NULL. You are correct that if p is NULL there is no way that DoWhateverWithP can execute but using a null pointer or simply not executing the function are both unacceptable ways to handle the fack the p is NULL.
The important thing to remember is to exit early and fail fast - this kind of approach yields code that is easier to debug.
In addition to the other answers, it depends upon what NULL signifies. For example, this code is perfectly OK, and is pretty idiomatic:
while (fgets(buf, sizeof buf, fp) != NULL) {
process(buf);
}
Here, NULL value indicates not only error, but end-of-file condition as well. Similarly, strtok() returns NULL to say, "there are no more tokens" (although one should avoid strtok() to begin with, but I digress). In cases like this, it is perfectly OK to call a function if the returned pointer is not NULL, and do nothing otherwise.
Edit: another example, closer to what was asked:
const char *data = "this;is;a;test;";
const char *curr = data;
const char *p;
while ((p = strchr(curr, ';')) != NULL) {
/* process data in [curr, p) */
process(curr, p);
curr = p + 1;
}
Once again, NULL here is an indication from strchr() that it couldn't find a ;, and that we should stop processing the data further.
Having said that, if NULL is not used as an indication, then it depends:
If the pointer can't be NULL at this point in code, it's useful to have an assert(p != NULL); when developing, and also having a fprintf(stderr, "Can't happen\n"); or equivalent statement, and then take whatever action as appropriate (abort() or similar is probably the only sane choice at this point).
If the pointer can be NULL, and it's not critical, it might be better to just bypass the usage of the null pointer. Suppose you're trying to allocate memory for writing a log message, and malloc() fails. You shouldn't abort the program because of this. If malloc() succeeds, you want to call a function (sprintf()/whatever) to fill the buffer.
If the pointer can be NULL, and it's critical. In this case, you probably want to fail, and hopefully such conditions don't happen too often.
Secondly, consider you have a function
that takes a pointer as an argument,
and you use this function multiple
times on multiple pointers throughout
your program. Do you find it more
beneficial to test for NULL in the
function (the benefit being you don't
have to test for NULL all over the
place), or on the pointer before
calling the function (the benefit
being no overhead from calling the
function)?
This depends upon a lot of factors. If I can be sure sometimes or most of the times that the pointer passed to a function cannot be NULL, the extra check in the function is wasteful. If the pointer passed comes out of a lot of places, and it's tricky to put in a check everywhere, sure, then the check is good to have in the function itself.
The standard library functions, for the most part, don't check for NULL: str*, mem* functions for example. An exception is free(), it does check for NULL.
A comment about assert: assert is a no-op if NDEBUG is defined, so one should not use it for debugging—its only use is during development to catch programming errors. Also, in C89, assert takes an int, so assert(p != NULL) is better in such cases than a just plain assert(p).
This non-NULLness check can be avoided by using references instead of pointers. This way, the compiler ensures the parameter passed is not NULL. For example:
void f(Param& param)
{
// "param" is a pointer that is guaranteed not to be NULL
}
In this case, it is up to the client to do the checking. However, mostly the client situation will be like this:
Param instance;
f(instance);
No non-NULLness checking is needed.
When using with objects allocated on the heap, you can do the following:
Param& instance = *new Param();
f(*instance);
Update: As user Crashworks remarks, it is still possible to make you program crash. However, when using references, it is the responsibility of the client to pass a valid reference, and as I show in the example, this is very easy to do.
How about: a comment clarifying the intent? If the intent is "this can't happen", then perhaps an assert would be the right thing to do instead of the if statement.
On the other hand, if a null value is normal, perhaps an "else comment" explaining why we can skip the "then" step would be in order. Stevel McConnel has a good section in "Code Complete" about if/else statements, and how a missing else is a very common error (distracted, forgot it?).
Consequently, I usually put a comment in for a "no-op else", unless it is something of the form "if done, return/break".
When you check for NULL, it is not good idea just to skip the function call. You should have an else-part that does something meaningful in case of NULL, for example throws an error or returns error code to upper level.
On the other hand, NULL is not always an error. It is often used to indicate for example that end of data has been reached. In such case, you will have to handle the situation as normal program flow.
Well the answer to the first question is: you are talking about ideal situation, most of the code that I see which uses if ( p != NULL ) are legacy. Also suppose, you want to return an evaluator, and then call the evaluator with the data, but say there is no evaluator for that data, its make logical sense to return NULL and check for NULL before calling the evaluator.
The answer to the second question is, it depends on the situation, like the delete checks for the NULL pointer, whereas lots of other function don't. Sometimes, if you test the pointer inside the function, then you might have to test it in lots of functions like:
ABC(p);
a = DEF(p);
d = GHI(a);
JKL(p, d);
but this code would be much better:
if(p)
{
ABC(p);
a = DEF(p);
d = GHI(a);
JKL(p, d);
}
Could it possibly be more beneficial to just not check for NULL?
I wouldn't do it, I favor assertions on the frontline and some form of recovery in the body past that. What would assertions not provide to you, that not checking for null would? Similar effect, with easier interpretation and a formal acknowledgement.
In relation to the first question, do you always check for NULL before you use pointers?
It really depends on the code and the time available, but I am irritatingly good at it; a good chunk of 'implementation' in my programs consists of what a program should not do, rather than the usual 'what it should do'.
Secondly, consider you have a function that takes a pointer as an argument...
I test it in the function, as the function is (hopefully) the program that is reused more frequently. I also tend to test it before making the call, without that test, the error loses localization (useful for reporting and isolation).
I think I've seen more of. This way you don't proceed if you know it's going to blow up anyway.
if (NULL == p)
{
goto FunctionExit; // or some other common label to exit the function.
}
I think it is better to check for null. Although, you can cut down on the amount of checks you need to make.
For most cases I prefer a simple guard clause at the top of a function:
if (p == NULL) return;
That said, I typically only put the check on functions that are publicly exposed.
However, when the null pointer in unexpected I will throw an exception. (There are some functions it doesn't make any sense to call with null, and the consumer should be responsible enough to use it right.)
Constructor initialization can be used as an alternative to checking for null all the time. This is especially useful when the class contains a collection. The collection can be used throughout the class without checking whether it has been initialized.
Dereferencing a null pointer is undefined behavior. If you want to crash if the pointer is null, use an assert or something similar (and, depending on the defined behavior of your class, that can be a perfectly valid response - it's certainly better than continuing to run when people may be expecting something to have been done!).
Since the behavior of dereferencing a null pointer is undefined, it can do anything. Crash, corrupt memory, create a wormhole to an alternate dimension allowing the Elder Gods to come forth and devour all of mankind... anything. While bugs happen, depending upon undefined behavior is, by definition, a bug. So don't do it deliberately.
#define SAFE_DELETE(a) if( (a) != NULL ) delete (a); (a) = NULL;
OR
template<typename T> void safe_delete(T*& a) {
delete a;
a = NULL;
}
or any other better way
I would say neither, as both will give you a false sense of security. For example, suppose you have a function:
void Func( SomePtr * p ) {
// stuff
SafeDelete( p );
}
You set p to NULL, but the copies of p outside the function are unaffected.
However, if you must do this, go with the template - macros will always have the potential for tromping on other names.
Clearly the function, for a simple reason. The macro evaluates its argument multiple times. This can have evil side effects. Also the function can be scoped. Nothing better than that :)
delete a;
ISO C++ specifies, that delete on a NULL pointer just doesn't do anything.
Quote from iso 14882:
5.3.5 Delete [expr.delete]
2 [...] In either alternative, if the value of the operand of delete is the
null pointer the operation has no effect. [...]
Regards, Bodo
/edit: I didn't notice the a=NULL; in the original post, so new version: delete a; a=NULL; however, the problem with setting a=NULL has already been pointed out (false feeling of security).
Generally, prefer inline functions over macros, as macros don't respect scope, and may conflict with some symbols during preprocessing, leading to very strange compile errors.
Of course, sometimes templates and functions won't do, but here this is not the case.
Additionally, the better safe-delete is not necessary, as you could use smart-pointers, therefore not requiring to remember using this method in the client-code, but encapsulating it.
(edit) As others have pointed out, safe-delete is not safe, as even if somebody does not forget to use it, it still may not have the desired effect. So it is actually completely worthless, because using safe_delete correctly needs more thought than just setting to 0 by oneself.
You don't need to test for nullity with delete, it is equivalent to a no-op. (a) = NULL makes me lift an eyebrow. The second option is better.
However, if you have a choice, you should use smart pointers, such as std::auto_ptr or tr1::shared_ptr, which already do this for you.
I think
#define SAFE_DELETE(pPtr) { delete pPtr; pPtr = NULL } is better
its ok to call delete if pPtr is NULL. So if check is not required.
in case if you call SAFE_DELETE(ptr+i), it will result in compilation error.
Template definition will create multiple instances of the function for each data type. In my opinion in this case, these multiple definitions donot add any value.
Moreover, with template function definition, you have overhead of function call.
Usage of SAFE_DELETE really appears to be a C programmers approach to commandeering the built in memory management in C++. My question is: Will C++ allow this method of using a SAFE_DELETE on pointers that have been properly encapsulated as Private? Would this macro ONLY work on pointer that are declared Public? OOP BAD!!
As mentioned quite a bit above, the second one is the better one, not a macro with potential unintended side effects, doesn't have the unneeded check against NULL (although I suspect you are doing that as a type check), etc. But neither are promising any safety. If you do use something like tr1::smart_ptr, please make sure you read the docs on them and are sure that it has the right semantics for your task. I just recently had to hunt down and clean up a huge memory leak due to a co-worker putting smart_ptrs into a data structure with circular links :) (he should have used weak_ptrs for back references)
I prefer this version:
~scoped_ptr() {
delete this->ptr_; //this-> for emphasis, ptr_ is owned by this
}
Setting the pointer to null after deleting it is quite pointless, as the only reason that you would use pointers is to allow an object to be referenced in multiple places at once. Even if the pointer in one part of the program is 0 there may well be others that are not set to 0.
Furthermore the safe_delete macro / function template is very difficult to use right, because there are only two places that it can be used if there is code that may throw between the new and delete for the given pointer.
1) Inside either a catch (...) block that rethrows the exception and also duplicated next to the catch (...) block for the path that doesn't throw. (Also duplicated next to every break, return, continue etc that may allow the pointer to fall out of scope)
2) Inside a destructor for an object that owns the pointer (unless there is no code between the new and delete that can throw).
Even if there is no code that could throw when you write the code, this could change in the future (all it takes is for someone to came along and add another new after the first one). It is better write code in a way that stays correct even in the face of exceptions.
Option 1 creates so much code duplication and is so easy to get wrong that I am doubtful to even call it an option.
Option 2 makes safe_delete redundant, as the ptr_ that you are setting to 0 will go out of scope on the next line.
In summary -- don't use safe_delete as it is not safe at all (it is very difficult to use correctly, and leads to redundant code even when its use is correct). Use SBRM and smart pointers.
I know most people think that as a bad practice but when you are trying to make your class public interface only work with references, keeping pointers inside and only when necessary, I think there is no way to return something telling that the value you are looking doesn't exist in the container.
class list {
public:
value &get(type key);
};
Let's think that you don't want to have dangerous pointers being saw in the public interface of the class, how do you return a not found in this case, throwing an exception?
What is your approach to that? Do you return an empty value and check for the empty state of it? I actually use the throw approach but I introduce a checking method:
class list {
public:
bool exists(type key);
value &get(type key);
};
So when I forget to check that the value exists first I get an exception, that is really an exception.
How would you do it?
The STL deals with this situation by using iterators. For example, the std::map class has a similar function:
iterator find( const key_type& key );
If the key isn't found, it returns 'end()'. You may want to use this iterator approach, or to use some sort of wrapper for your return value.
The correct answer (according to Alexandrescu) is:
Optional and Enforce
First of all, do use the Accessor, but in a safer way without inventing the wheel:
boost::optional<X> get_X_if_possible();
Then create an enforce helper:
template <class T, class E>
T& enforce(boost::optional<T>& opt, E e = std::runtime_error("enforce failed"))
{
if(!opt)
{
throw e;
}
return *opt;
}
// and an overload for T const &
This way, depending on what might the absence of the value mean, you either check explicitly:
if(boost::optional<X> maybe_x = get_X_if_possible())
{
X& x = *maybe_x;
// use x
}
else
{
oops("Hey, we got no x again!");
}
or implicitly:
X& x = enforce(get_X_if_possible());
// use x
You use the first way when you’re concerned about efficiency, or when you want to handle the failure right where it occurs. The second way is for all other cases.
The problem with exists() is that you'll end up searching twice for things that do exist (first check if it's in there, then find it again). This is inefficient, particularly if (as its name of "list" suggests) your container is one where searching is O(n).
Sure, you could do some internal caching to avoid the double search, but then your implementation gets messier, your class becomes less general (since you've optimised for a particular case), and it probably won't be exception-safe or thread-safe.
Don't use an exception in such a case. C++ has a nontrivial performance overhead for such exceptions, even if no exception is thrown, and it additially makes reasoning about the code much harder (cf. exception safety).
Best-practice in C++ is one of the two following ways. Both get used in the STL:
As Martin pointed out, return an iterator. Actually, your iterator can well be a typedef for a simple pointer, there's nothing speaking against it; in fact, since this is consistent with the STL, you could even argue that this way is superior to returning a reference.
Return a std::pair<bool, yourvalue>. This makes it impossible to modify the value, though, since a copycon of the pair is called which doesn't work with referende members.
/EDIT:
This answer has spawned quite some controversy, visible from the comments and not so visible from the many downvotes it got. I've found this rather surprising.
This answer was never meant as the ultimate point of reference. The “correct” answer had already been given by Martin: execeptions reflect the behaviour in this case rather poorly. It's semantically more meaningful to use some other signalling mechanism than exceptions.
Fine. I completely endorse this view. No need to mention it once again. Instead, I wanted to give an additional facet to the answers. While minor speed boosts should never be the first rationale for any decision-making, they can provide further arguments and in some (few) cases, they may even be crucial.
Actually, I've mentioned two facets: performance and exception safety. I believe the latter to be rather uncontroversial. While it's extremely hard to give strong exceptions guarantees (the strongest, of course, being “nothrow”), I believe it's essential: any code that is guaranteed to not throw exceptions makes the whole program easier to reason about. Many C++ experts emphasize this (e.g. Scott Meyers in item 29 of “Effective C++”).
About speed. Martin York has pointed out that this no longer applies in modern compilers. I respectfully disagree. The C++ language makes it necessary for the environment to keep track, at runtime, of code paths that may be unwound in the case of an exception. Now, this overhead isn't really all that big (and it's quite easy to verify this). “nontrivial” in my above text may have been too strong.
However, I find it important to draw the distinction between languages like C++ and many modern, “managed” languages like C#. The latter has no additional overhead as long as no exception is thrown because the information necessary to unwind the stack is kept anyway. By and large, stand by my choice of words.
STL Iterators?
The "iterator" idea proposed before me is interesting, but the real point of iterators is navigation through a container. Not as an simple accessor.
If you're accessor is one among many, then iterators are the way to go, because you will be able to use them to move in the container. But if your accessor is a simple getter, able to return either the value or the fact there is no value, then your iterator is perhaps only a glorified pointer...
Which leads us to...
Smart pointers?
The point of smart pointers is to simplify pointer ownership. With a shared pointer, you'll get a ressource (memory) which will be shared, at the cost of an overhead (shared pointers needs to allocate an integer as a reference counter...).
You have to choose: Either your Value is already inside a shared pointer, and then, you can return this shared pointer (or a weak pointer). Or Your value is inside a raw pointer. Then you can return the row pointer. You don't want to return a shared pointer if your ressource is not already inside a shared pointer: A World of funny things will happen when your shared pointer will get out of scope an delete your Value without telling you...
:-p
Pointers?
If your interface is clear about its ownership of its ressources, and by the fact the returned value can be NULL, then you could return a simple, raw pointer. If the user of your code is dumb enough ignore the interface contract of your object, or to play arithmetics or whatever with your pointer, then he/she will be dumb enough to break any other way you'll choose to return the value, so don't bother with the mentally challenged...
Undefined Value
Unless your Value type really has already some kind of "undefined" value, and the user knows that, and will accept to handle that, it is a possible solution, similar to the pointer or iterator solution.
But do not add a "undefined" value to your Value class because of the problem you asked: You'll end up raising the "references vs. pointer" war to another level of insanity. Code users want the objects you give them to either be Ok, or to not exist. Having to test every other line of code this object is still valid is a pain, and will complexify uselessly the user code, by your fault.
Exceptions
Exceptions are usually not as costly as some people would like them to be. But for a simple accessor, the cost could be not trivial, if your accessor is used often.
For example, the STL std::vector has two accessors to its value through an index:
T & std::vector::operator[]( /* index */ )
and:
T & std::vector::at( /* index */ )
The difference being that the [] is non-throwing . So, if you access outside the range of the vector, you're on your own, probably risking memory corruption, and a crash sooner or later. So, you should really be sure you verified the code using it.
On the other hand, at is throwing. This means that if you access outside the range of the vector, then you'll get a clean exception. This method is better if you want to delegate to another code the processing of an error.
I use personnaly the [] when I'm accessing the values inside a loop, or something similar. I use at when I feel an exception is the good way to return the current code (or the calling code) the fact something went wrong.
So what?
In your case, you must choose:
If you really need a lightning-fast access, then the throwing accessor could be a problem. But this means you already used a profiler on your code to determinate this is a bottleneck, doesn't it?
;-)
If you know that not having a value can happen often, and/or you want your client to propagate a possible null/invalid/whatever semantic pointer to the value accessed, then return a pointer (if your value is inside a simple pointer) or a weak/shared pointer (if your value is owned by a shared pointer).
But if you believe the client won't propagate this "null" value, or that they should not propagate a NULL pointer (or smart pointer) in their code, then use the reference protected by the exception. Add a "hasValue" method returning a boolean, and add a throw should the user try the get the value even if there is none.
Last but not least, consider the code that will be used by the user of your object:
// If you want your user to have this kind of code, then choose either
// pointer or smart pointer solution
void doSomething(MyClass & p_oMyClass)
{
MyValue * pValue = p_oMyClass.getValue() ;
if(pValue != NULL)
{
// Etc.
}
}
MyValue * doSomethingElseAndReturnValue(MyClass & p_oMyClass)
{
MyValue * pValue = p_oMyClass.getValue() ;
if(pValue != NULL)
{
// Etc.
}
return pValue ;
}
// ==========================================================
// If you want your user to have this kind of code, then choose the
// throwing reference solution
void doSomething(MyClass & p_oMyClass)
{
if(p_oMyClass.hasValue())
{
MyValue & oValue = p_oMyClass.getValue() ;
}
}
So, if your main problem is choosing between the two user codes above, your problem is not about performance, but "code ergonomy". Thus, the exception solution should not be put aside because of potential performance issues.
:-)
Accessor?
The "iterator" idea proposed before me is interesting, but the real point of iterators is navigation through a container. Not as an simple accessor.
I agree with paercebal, an iterator is to iterate. I don't like the way STL does. But the idea of an accessor seems more appealing. So what we need? A container like class that feels like a boolean for testing but behaves like the original return type. That would be feasible with cast operators.
template <T> class Accessor {
public:
Accessor(): _value(NULL)
{}
Accessor(T &value): _value(&value)
{}
operator T &() const
{
if (!_value)
throw Exception("that is a problem and you made a mistake somewhere.");
else
return *_value;
}
operator bool () const
{
return _value != NULL;
}
private:
T *_value;
};
Now, any foreseeable problem? An example usage:
Accessor <type> value = list.get(key);
if (value) {
type &v = value;
v.doSomething();
}
How about returning a shared_ptr as the result. This can be null if the item wasn't found. It works like a pointer, but it will take care of releasing the object for you.
(I realize this is not always the right answer, and my tone a bit strong, but you should consider this question before deciding for other more complex alternatives):
So, what's wrong with returning a pointer?
I've seen this one many times in SQL, where people will do their earnest to never deal with NULL columns, like they have some contagious decease or something. Instead, they cleverly come up with a "blank" or "not-there" artificial value like -1, 9999 or even something like '#X-EMPTY-X#'.
My answer: the language already has a construct for "not there"; go ahead, don't be afraid to use it.
what I prefer doing in situations like this is having a throwing "get" and for those circumstances where performance matter or failiure is common have a "tryGet" function along the lines of "bool tryGet(type key, value **pp)" whoose contract is that if true is returned then *pp == a valid pointer to some object else *pp is null.
#aradtke, you said.
I agree with paercebal, an iterator is
to iterate. I don't like the way STL
does. But the idea of an accessor
seems more appealing. So what we need?
A container like class that feels like
a boolean for testing but behaves like
the original return type. That would
be feasible with cast operators. [..] Now,
any foreseeable problem?
First, YOU DO NOT WANT OPERATOR bool. See Safe Bool idiom for more info. But about your question...
Here's the problem, users need to now explict cast in cases. Pointer-like-proxies (such as iterators, ref-counted-ptrs, and raw pointers) have a concise 'get' syntax. Providing a conversion operator is not very useful if callers have to invoke it with extra code.
Starting with your refence like example, the most concise way to write it:
// 'reference' style, check before use
if (Accessor<type> value = list.get(key)) {
type &v = value;
v.doSomething();
}
// or
if (Accessor<type> value = list.get(key)) {
static_cast<type&>(value).doSomething();
}
This is okay, don't get me wrong, but it's more verbose than it has to be. now consider if we know, for some reason, that list.get will succeed. Then:
// 'reference' style, skip check
type &v = list.get(key);
v.doSomething();
// or
static_cast<type&>(list.get(key)).doSomething();
Now lets go back to iterator/pointer behavior:
// 'pointer' style, check before use
if (Accessor<type> value = list.get(key)) {
value->doSomething();
}
// 'pointer' style, skip check
list.get(key)->doSomething();
Both are pretty good, but pointer/iterator syntax is just a bit shorter. You could give 'reference' style a member function 'get()'... but that's already what operator*() and operator->() are for.
The 'pointer' style Accessor now has operator 'unspecified bool', operator*, and operator->.
And guess what... raw pointer meets these requirements, so for prototyping, list.get() returns T* instead of Accessor. Then when the design of list is stable, you can come back and write the Accessor, a pointer-like Proxy type.
Interesting question. It's a problem in C++ to exclusively use references I guess - in Java the references are more flexible and can be null. I can't remember if it's legal C++ to force a null reference:
MyType *pObj = nullptr;
return *pObj
But I consider this dangerous. Again in Java I'd throw an exception as this is common there, but I rarely see exceptions used so freely in C++.
If I was making a puclic API for a reusable C++ component and had to return a reference, I guess I'd go the exception route.
My real preference is to have the API return a pointer; I consider pointers an integral part of C++.