Avoiding null pointer crashes in C++ by overloading operators - bad practice?

Avoiding null pointer crashes in C++ by overloading operators - bad practice? - c++

I'm starting to write a rather large Qt application and instead of using raw pointers I want to use smart pointers, as well as Qt's own guarded pointer called QPointer.
With both standard library smart pointers and Qt's pointers the application crashes when a NULL pointer is dereferenced.
My idea was that I could add a custom overload to the dereference operators * and -> of these pointer types that check if the pointer is NULL.
Below is a short example that works fine so far. If a NULL pointer was dereferenced, a temporary dummy object would be created so that the application does not crash. How this dummy object would be processed might not be always correct, but at least there would be no crash and I could even react on this and show a warning or write it to a log file.
template <class T>
class Ptr : public std::shared_ptr<T> {
private:
T * m_temp;
public:
Ptr<T>(T * ptr) : std::shared_ptr<T>(ptr), m_temp(NULL) {}
~Ptr() {
if (m_temp) {
delete m_temp;
}
}
T * operator->() {
if (!std::shared_ptr<T>::get()) {
if (m_temp) {
delete m_temp;
}
m_temp = new T();
return m_temp;
} else {
return std::shared_ptr<T>::get();
}
}
T & operator*() {
return *operator->();
}
};
Of course I'll be doing NULL checks and try to eliminate the source of NULL pointers as much as possible, but for the rare case that it I forget a NULL check and the exception occurs, could this be a good way of handling it? Or is this a bad idea?

I would say this is a bad idea for a few reasons:
You cannot derive from standard library types. It may work until you change something benign in your code and then it breaks. There are various things you can do to make this more acceptable, but the easiest thing is to just not do this.
There are more ways to create a shared_ptr than just a constructor call. Duplicating the pointer value in your m_temp variable is likely just to lead things to be out of sync and cause more problems. By the time you cover all the bases, you will have probably re-implemented the whole shared_ptr class.
m_temp = new T(); seems like a frankly crazy thing to do if the old pointer is null. What about all the state stored in the object that was previously null? What about constructor parameters? Any initialization for the pointer? Sure, you could maybe handle all of these, but by that point you might as well handle the nullptr check elsewhere where things will be clearer.
You don't want to hide values being nullptr. If you have code using a pointer, it should care about the value of that pointer. If it is null and that is unexpected, then something further up the chain likely went wrong and you should be handling that appropriately (exceptions, error codes, logging, etc.). Silently allocating a new pointer will just hide the original source of the error. Whenever there is something wrong in a program, you want to stop or address the problem as close to the source as possible - it makes debugging the problem simpler.
A side note, if you are confident that your pointers are not null and don't want to have to deal with nullptr in a block of code, you may be able to use references instead. For example:
void fun1(MyObject* obj) {}
void fun2(MyObject& obj) {}
In fun1, the code might need to check for nullptr to be well written. In fun2, there is no need to check for nullptr because if someone converts a nullptr to a reference they have already broken the rules. fun2 pushes any responsibility for checking the pointer value higher up the stack. This can be good in some cases (just don't try and store the reference for later). Note that you can use operator * on a shared_ptr/unique_ptr to get a reference directly.

Related

Rely on boolean value to access pointers

I have a class called MyClass that contains a pointer to an object, like so:
class MyClass {
public:
MyClass() : _blob(nullptr) {}
~MyClass() { free(); }
void generate() {
if(!_blob)
_blob = new Blob;
}
void free() {
if(_blob) {
delete _blob;
_blob = nullptr;
}
}
Blob* getBlob() { return _blob; }
private:
Blob *_blob;
};
The reason why it holds a pointer to a heap allocated object and doesn't contain it on the stack is because some MyClass instances don't allocate data for this object, therefore before accessing it I need to check if it's not a null pointer:
if(myClass->getBlob())
myClass->getBlob()->doSomething();
Now I had this idea to store a bool variable called _hasBlob and then use it like this
if(myClass->hasBlob())
myClass->getBlob()->doSomething();
Is this considered faster and legit? Or is it considered bad practice because I can see the danger potential of somehow ending up dereferencing a nullptr.

It's legit but redundant. It's not faster. Infact, while the check itself is as fast, maintaining the boolean in sync with the validity of the pointer is marginally slower. The worst thing about it is the maintenance burden of proving that the boolean is always in sync. Once it's implemented and proven correct though, it's just redundant and a waste of memory.

In general, you would expect MyClass::getBlob() to always give you a valid object whenever you ask for it. So in general my suggestion would be to implement that method as follows:
Blob* getBlob() {
if(_blob == nullptr) {
_blob = new Blob;
// Or call private method generateBlob, if there is a lot of logic
}
return _blob;
}
Alternatively, if default-constructed Blobs are small and you don't have thouasands of MyClass instances, you could just create the Blob object in the constructor and have getBlob return a reference, so you never have to check anything once you have a valid MyClass object.
If you want to avoid this automatic creation of Blob objects, you could add a hasBlob check, but instead of keeping a separate boolean, I would just implement it as
bool hasBlob() const { return _blob != nullptr; }
This function is almost certain to be inlined, doesn't cost you any extra storage, and it guarantees the correct result (for instance, it can never happen that you set _hasBlob = true and then fail to allocate a Blob). As you say, there is still a risk of dereferencing a null pointer, but as you have clearly documented (hopefully) that getBlob may return a null pointer if there is no Blob allocated for this instance, that risk is now with the caller and they should take care to check the result, as with any function returning a pointer. In fact, this solution is exactly equivalent to the code you already had, since if(myObject->getBlob()) now does exactly the same check as if(myObject->hasBlob()) would -- the only difference being that perhaps the latter is slightly more self-documenting.
Since you indicate in the comments that you are worried about the performance: I suspect that checking a pointer against null is fairly fast, but as usual, if you want to be sure, the usual advice of "Measure it!" holds. For example, you may find that because of the extra check in my first version of getBlob the compiler will not inline the function.

Is this an error in "More Effective C++" in Item28?

I encountered a question when I was reading the item28 in More Effective C++. In this item, the author shows to us that we can use member template in SmartPtr such that the SmartPtr<Cassette> can be converted to SmartPtr<MusicProduct>.
The following code is not the same as in the book, but has the same effect.
#include <iostream>
class Base {};
class Derived : public Base {};
template<typename T>
class smart {
public:
smart(T* ptr)
: ptr(ptr)
{}
template<typename U>
operator smart<U>()
{
return smart<U>(ptr);
}
~smart()
{
delete ptr;
}
private:
T* ptr;
};
void test(const smart<Base>& ) {}
int main()
{
smart<Derived> sd(new Derived);
test(sd);
return 0;
}
It indeed can be compiled without compilation error. But when I ran the executable file, I got a core dump. I think that's because the member function of the conversion operator makes a temporary smart, which has a pointer to the same ptr in sd (its type is smart<Derived>). So the delete directive operates twice. What's more, after calling test, we can never use sd any more, since ptr in sd has already been delete.
Now my questions are :
Is my thought right? Or my code is not the same as the original code in the book?
If my thought is right, is there any method to do this?
Thanks very much for your help.

Yes, you've described the problem with your code fairly accurately.
As far as how to make it work: just about like the usual when you run into problems from a shallow copy: do a deep copy instead. That is, instead of just creating another pointer to the same data, you'd need to clone the data, and have the second object point to the clone of the data instead of the original data.
Alternatively, use a reference counted pointer, and increment the reference count when you do a copy, and decrement it when a copy is destroyed. When the count reaches zero (and not before) delete the pointee data.
Generally speaking: avoid doing all of this. Assuming you're using a relatively up-to-date compiler, the standard library should already contain a shared_ptr and a unique_ptr that can handle a lot of your smart pointer needs.

Your interpretation is correct, the conversion operator will create a different object that holds a pointer to the same underlying object. Once it goes out of scope it will be destroyed, and it will in turn call delete.
Not sure I understand the last question, if what you ask is whether this can be useful or not, it can be useful if implemented correctly. For example, if instead of a raw pointer and manually allocating/deleting the memory you were using a std::shared_ptr then it would work just fine. In other cases there might not even be a dynamically allocated object... This is just a tool, use it where it makes sense.

Function argument as reference to avoid checking for NULL

If I have a function that takes in a pointer which should never be NULL, I usually do something like this:
void Foo(const somePointer* ptr)
{
if (ptr == NULL)
{
// Throw assertion
return;
}
// Do something
}
So now I check every time whether the pointer is NULL and if it is not set to NULL in the first place and not allocated either then that check is useless. So now I am thinking whether I should define my functions like so (although I realize that does not guarantee I get a valid object, at least it won't be NULL):
void Foo(const somePointer& ptr)
{
// No need to check anymore, client is responsible
// Do something
}
And before I do (or don't, depending on the answers I get here), I thought I would ask here and see what everyone has to say, especially its pros and cons.

Well, if you never ever want a non-existent object passed in, use a reference (Note: non-existent, not non-valid).
If you want that possibility, use a pointer.

A lot depends on the shape of your code - if you write a lot of stuff like this:
A * a = new A();
f( a );
then it seems sensible for f() to take a pointer, rather than to have to write:
f( *a );
Personally, I almost never check for NULLs, new can't return one, and if you find you have one you are probably already in UB land.

I think it's pointless as a safety check. It's marginally worthwhile as documentation, though.
If you make the change, all that will happen is that some user will change this code:
somePointer *ptr = something();
Foo(ptr);
To this:
somePointer *ptr = something();
Foo(*ptr);
Now, if ptr was null, then the first code is invalid, and it was their fault for passing null into a function whose parameter "should never be NULL". The second code is also invalid, and it was their fault for dereferencing a null pointer.
It's useful as documentation, in that maybe when they type the * character they will think, "oh, hang on, this better not be null". Whereas if all you've done is document that null is an invalid input (like, say, strlen does), then they'd have to read the documentation in order to know not to pass in a null pointer. In theory, the user of your code will check the docs instead of just mashing the keyboard with his face until he has something that compiles, and assuming that will work. In practice, we all have our less intelligent moments.

Returning reference to a pointer- C++

Consider the following class.
class mapping_items
{
public:
mapping_items(){}
void add(const mapping_item* item) {
items_.push_back( item );
}
size_t count() const{
return items_.size();
}
const mapping_item& find(const std::string& pattern){
const mapping_item* item = // iterate vector and find item;
return *item;
}
private:
mapping_items(const mapping_items&); // not allowed
mapping_items& operator=(const mapping_items&); // not allowed
std::vector<const mapping_item*> items_;
};
C++ FAQ says,
Use references when you can, and
pointers when you have to.
So in the above example, should I return const mapping_item& or const mapping_item* ?
The reason why I chose mapping_item& is because there will be always a default return value available. I will never have null returns. So a reference makes it clear that it can't have nulls. Is this the correct design?

There is a problem - what happens if your find() function fails? If this is expected never to happen, you are OK returning a reference (and raise an exception if it happens despite the fact it shouldn't). If on the other hand it may happen (e.g. looking up a name in an address book), you should consider returning a pointer, as a pointer can be NULL, indicating the find failed.

This is seems like an appropriate design choice to me - like the C++ FAQ states - uses references when you can. IMO, unnecessary use of pointers just seems to make code harder to understand.

Yes, it's the correct design. Clients can rely on values being non-null.
On a related note, some other class is responsible for managing the lifetime of mapping_item's?
Pointers and ownership easily introduces memory leaks or worse. You might want to consider whether you actually need to store pointers, or if you can get away with copying mapping_item's instead, to avoid memory leaks. However, pointers are necessary if you need to manage subclassed mapping_item's. Pointers are advisable if instances are large or need to be shared.
If you really need pointers, consider using boost::shared_ptr<> rather than raw pointers, both inside your class and as parameter types to e.g. the add() function.

Some people say, and I agree,
use pointers if value can be NULL
and references otherwise
As to your example, I'd probably go for return const mapping_item;, so by value, to avoid having a reference to a temporary, and hope for my compiler to optimize copying away.

What are potential dangers when using boost::shared_ptr?

What are some ways you can shoot yourself in the foot when using boost::shared_ptr? In other words, what pitfalls do I have to avoid when I use boost::shared_ptr?

Cyclic references: a shared_ptr<> to something that has a shared_ptr<> to the original object. You can use weak_ptr<> to break this cycle, of course.
I add the following as an example of what I am talking about in the comments.
class node : public enable_shared_from_this<node> {
public :
void set_parent(shared_ptr<node> parent) { parent_ = parent; }
void add_child(shared_ptr<node> child) {
children_.push_back(child);
child->set_parent(shared_from_this());
}
void frob() {
do_frob();
if (parent_) parent_->frob();
}
private :
void do_frob();
shared_ptr<node> parent_;
vector< shared_ptr<node> > children_;
};
In this example, you have a tree of nodes, each of which holds a pointer to its parent. The frob() member function, for whatever reason, ripples upwards through the tree. (This is not entirely outlandish; some GUI frameworks work this way).
The problem is that, if you lose reference to the topmost node, then the topmost node still holds strong references to its children, and all its children also hold a strong reference to their parents. This means that there are circular references keeping all the instances from cleaning themselves up, while there is no way of actually reaching the tree from the code, this memory leaks.
class node : public enable_shared_from_this<node> {
public :
void set_parent(shared_ptr<node> parent) { parent_ = parent; }
void add_child(shared_ptr<node> child) {
children_.push_back(child);
child->set_parent(shared_from_this());
}
void frob() {
do_frob();
shared_ptr<node> parent = parent_.lock(); // Note: parent_.lock()
if (parent) parent->frob();
}
private :
void do_frob();
weak_ptr<node> parent_; // Note: now a weak_ptr<>
vector< shared_ptr<node> > children_;
};
Here, the parent node has been replaced by a weak pointer. It no longer has a say in the lifetime of the node to which it refers. Thus, if the topmost node goes out of scope as in the previous example, then while it holds strong references to its children, its children don't hold strong references to their parents. Thus there are no strong references to the object, and it cleans itself up. In turn, this causes the children to lose their one strong reference, which causes them to clean up, and so on. In short, this wont leak. And just by strategically replacing a shared_ptr<> with a weak_ptr<>.
Note: The above applies equally to std::shared_ptr<> and std::weak_ptr<> as it does to boost::shared_ptr<> and boost::weak_ptr<>.

Creating multiple unrelated shared_ptr's to the same object:
#include <stdio.h>
#include "boost/shared_ptr.hpp"
class foo
{
public:
foo() { printf( "foo()\n"); }
~foo() { printf( "~foo()\n"); }
};
typedef boost::shared_ptr<foo> pFoo_t;
void doSomething( pFoo_t p)
{
printf( "doing something...\n");
}
void doSomethingElse( pFoo_t p)
{
printf( "doing something else...\n");
}
int main() {
foo* pFoo = new foo;
doSomething( pFoo_t( pFoo));
doSomethingElse( pFoo_t( pFoo));
return 0;
}

Constructing an anonymous temporary shared pointer, for instance inside the arguments to a function call:
f(shared_ptr<Foo>(new Foo()), g());
This is because it is permissible for the new Foo() to be executed, then g() called, and g() to throw an exception, without the shared_ptr ever being set up, so the shared_ptr does not have a chance to clean up the Foo object.

Be careful making two pointers to the same object.
boost::shared_ptr<Base> b( new Derived() );
{
boost::shared_ptr<Derived> d( b.get() );
} // d goes out of scope here, deletes pointer
b->doSomething(); // crashes
instead use this
boost::shared_ptr<Base> b( new Derived() );
{
boost::shared_ptr<Derived> d =
boost::dynamic_pointer_cast<Derived,Base>( b );
} // d goes out of scope here, refcount--
b->doSomething(); // no crash
Also, any classes holding shared_ptrs should define copy constructors and assignment operators.
Don't try to use shared_from_this() in the constructor--it won't work. Instead create a static method to create the class and have it return a shared_ptr.
I've passed references to shared_ptrs without trouble. Just make sure it's copied before it's saved (i.e., no references as class members).

Here are two things to avoid:
Calling the get() function to get the raw pointer and use it after the pointed-to object goes out of scope.
Passing a reference of or a raw pointer to a shared_ptr should be dangerous too, since it won't increment the internal count which helps keep the object alive.

We debug several weeks strange behavior.
The reason was:
we passed 'this' to some thread workers instead of 'shared_from_this'.

Not precisely a footgun, but certainly a source of frustration until you wrap your head around how to do it the C++0x way: most of the predicates you know and love from <functional> don't play nicely with shared_ptr. Happily, std::tr1::mem_fn works with objects, pointers and shared_ptrs, replacing std::mem_fun, but if you want to use std::negate, std::not1, std::plus or any of those old friends with shared_ptr, be prepared to get cozy with std::tr1::bind and probably argument placeholders as well. In practice this is actually a lot more generic, since now you basically end up using bind for every function object adaptor, but it does take some getting used to if you're already familiar with the STL's convenience functions.
This DDJ article touches on the subject, with lots of example code. I also blogged about it a few years ago when I first had to figure out how to do it.

Using shared_ptr for really small objects (like char short) could be an overhead if you have a lot of small objects on heap but they are not really "shared". boost::shared_ptr allocates 16 bytes for every new reference count it creates on g++ 4.4.3 and VS2008 with Boost 1.42. std::tr1::shared_ptr allocates 20 bytes. Now if you have a million distinct shared_ptr<char> that means 20 million bytes of your memory are gone in holding just count=1. Not to mention the indirection costs and memory fragmentation. Try with the following on your favorite platform.
void * operator new (size_t size) {
std::cout << "size = " << size << std::endl;
void *ptr = malloc(size);
if(!ptr) throw std::bad_alloc();
return ptr;
}
void operator delete (void *p) {
free(p);
}

Giving out a shared_ptr< T > to this inside a class definition is also dangerous.
Use enabled_shared_from_this instead.
See the following post here

You need to be careful when you use shared_ptr in multithread code. It's then relatively easy to become into a case when couple of shared_ptrs, pointing to the same memory, is used by different threads.

The popular widespread use of shared_ptr will almost inevitably cause unwanted and unseen memory occupation.
Cyclic references are a well known cause and some of them can be indirect and difficult to spot especially in complex code that is worked on by more than one programmer; a programmer may decide than one object needs a reference to another as a quick fix and doesn't have time to examine all the code to see if he is closing a cycle. This hazard is hugely underestimated.
Less well understood is the problem of unreleased references. If an object is shared out to many shared_ptrs then it will not be destroyed until every one of them is zeroed or goes out of scope. It is very easy to overlook one of these references and end up with objects lurking unseen in memory that you thought you had finished with.
Although strictly speaking these are not memory leaks (it will all be released before the program exits) they are just as harmful and harder to detect.
These problems are the consequences of expedient false declarations: 1. Declaring what you really want to be single ownership as shared_ptr. scoped_ptr would be correct but then any other reference to that object will have to be a raw pointer, which could be left dangling. 2. Declaring what you really want to be a passive observing reference as shared_ptr. weak_ptr would be correct but then you have the hassle of converting it to share_ptr every time you want to use it.
I suspect that your project is a fine example of the kind of trouble that this practice can get you into.
If you have a memory intensive application you really need single ownership so that your design can explicitly control object lifetimes.
With single ownership opObject=NULL; will definitely delete the object and it will do it now.
With shared ownership spObject=NULL; ........who knows?......

If you have a registry of the shared objects (a list of all active instances, for example), the objects will never be freed. Solution: as in the case of circular dependency structures (see Kaz Dragon's answer), use weak_ptr as appropriate.

Smart pointers are not for everything, and raw pointers cannot be eliminated
Probably the worst danger is that since shared_ptr is a useful tool, people will start to put it every where. Since plain pointers can be misused, the same people will hunt raw pointers and try to replace them with strings, containers or smart pointers even when it makes no sense. Legitimate uses of raw pointers will become suspect. There will be a pointer police.
This is not only probably the worst danger, it may be the only serious danger. All the worst abuses of shared_ptr will be the direct consequence of the idea that smart pointers are superior to raw pointer (whatever that means), and that putting smart pointers everywhere will make C++ programming "safer".
Of course the mere fact that a smart pointer needs to be converted to a raw pointer to be used refutes this claim of the smart pointer cult, but the fact that the raw pointer access is "implicit" in operator*, operator-> (or explicit in get()), but not implicit in an implicit conversion, is enough to give the impression that this is not really a conversion, and that the raw pointer produced by this non-conversion is an harmless temporary.
C++ cannot be made a "safe language", and no useful subset of C++ is "safe"
Of course the pursuit of a safe subset ("safe" in the strict sense of "memory safe", as LISP, Haskell, Java...) of C++ is doomed to be endless and unsatisfying, as the safe subset of C++ is tiny and almost useless, as unsafe primitives are the rule rather than the exception. Strict memory safety in C++ would mean no pointers and only references with automatic storage class. But in a language where the programmer is trusted by definition, some people will insist on using some (in principle) idiot-proof "smart pointer", even where there is no other advantage over raw pointers that one specific way to screw the program state is avoided.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js