Rely on boolean value to access pointers - c++

I have a class called MyClass that contains a pointer to an object, like so:
class MyClass {
public:
MyClass() : _blob(nullptr) {}
~MyClass() { free(); }
void generate() {
if(!_blob)
_blob = new Blob;
}
void free() {
if(_blob) {
delete _blob;
_blob = nullptr;
}
}
Blob* getBlob() { return _blob; }
private:
Blob *_blob;
};
The reason why it holds a pointer to a heap allocated object and doesn't contain it on the stack is because some MyClass instances don't allocate data for this object, therefore before accessing it I need to check if it's not a null pointer:
if(myClass->getBlob())
myClass->getBlob()->doSomething();
Now I had this idea to store a bool variable called _hasBlob and then use it like this
if(myClass->hasBlob())
myClass->getBlob()->doSomething();
Is this considered faster and legit? Or is it considered bad practice because I can see the danger potential of somehow ending up dereferencing a nullptr.

It's legit but redundant. It's not faster. Infact, while the check itself is as fast, maintaining the boolean in sync with the validity of the pointer is marginally slower. The worst thing about it is the maintenance burden of proving that the boolean is always in sync. Once it's implemented and proven correct though, it's just redundant and a waste of memory.

In general, you would expect MyClass::getBlob() to always give you a valid object whenever you ask for it. So in general my suggestion would be to implement that method as follows:
Blob* getBlob() {
if(_blob == nullptr) {
_blob = new Blob;
// Or call private method generateBlob, if there is a lot of logic
}
return _blob;
}
Alternatively, if default-constructed Blobs are small and you don't have thouasands of MyClass instances, you could just create the Blob object in the constructor and have getBlob return a reference, so you never have to check anything once you have a valid MyClass object.
If you want to avoid this automatic creation of Blob objects, you could add a hasBlob check, but instead of keeping a separate boolean, I would just implement it as
bool hasBlob() const { return _blob != nullptr; }
This function is almost certain to be inlined, doesn't cost you any extra storage, and it guarantees the correct result (for instance, it can never happen that you set _hasBlob = true and then fail to allocate a Blob). As you say, there is still a risk of dereferencing a null pointer, but as you have clearly documented (hopefully) that getBlob may return a null pointer if there is no Blob allocated for this instance, that risk is now with the caller and they should take care to check the result, as with any function returning a pointer. In fact, this solution is exactly equivalent to the code you already had, since if(myObject->getBlob()) now does exactly the same check as if(myObject->hasBlob()) would -- the only difference being that perhaps the latter is slightly more self-documenting.
Since you indicate in the comments that you are worried about the performance: I suspect that checking a pointer against null is fairly fast, but as usual, if you want to be sure, the usual advice of "Measure it!" holds. For example, you may find that because of the extra check in my first version of getBlob the compiler will not inline the function.

Related

Avoiding null pointer crashes in C++ by overloading operators - bad practice?

I'm starting to write a rather large Qt application and instead of using raw pointers I want to use smart pointers, as well as Qt's own guarded pointer called QPointer.
With both standard library smart pointers and Qt's pointers the application crashes when a NULL pointer is dereferenced.
My idea was that I could add a custom overload to the dereference operators * and -> of these pointer types that check if the pointer is NULL.
Below is a short example that works fine so far. If a NULL pointer was dereferenced, a temporary dummy object would be created so that the application does not crash. How this dummy object would be processed might not be always correct, but at least there would be no crash and I could even react on this and show a warning or write it to a log file.
template <class T>
class Ptr : public std::shared_ptr<T> {
private:
T * m_temp;
public:
Ptr<T>(T * ptr) : std::shared_ptr<T>(ptr), m_temp(NULL) {}
~Ptr() {
if (m_temp) {
delete m_temp;
}
}
T * operator->() {
if (!std::shared_ptr<T>::get()) {
if (m_temp) {
delete m_temp;
}
m_temp = new T();
return m_temp;
} else {
return std::shared_ptr<T>::get();
}
}
T & operator*() {
return *operator->();
}
};
Of course I'll be doing NULL checks and try to eliminate the source of NULL pointers as much as possible, but for the rare case that it I forget a NULL check and the exception occurs, could this be a good way of handling it? Or is this a bad idea?
I would say this is a bad idea for a few reasons:
You cannot derive from standard library types. It may work until you change something benign in your code and then it breaks. There are various things you can do to make this more acceptable, but the easiest thing is to just not do this.
There are more ways to create a shared_ptr than just a constructor call. Duplicating the pointer value in your m_temp variable is likely just to lead things to be out of sync and cause more problems. By the time you cover all the bases, you will have probably re-implemented the whole shared_ptr class.
m_temp = new T(); seems like a frankly crazy thing to do if the old pointer is null. What about all the state stored in the object that was previously null? What about constructor parameters? Any initialization for the pointer? Sure, you could maybe handle all of these, but by that point you might as well handle the nullptr check elsewhere where things will be clearer.
You don't want to hide values being nullptr. If you have code using a pointer, it should care about the value of that pointer. If it is null and that is unexpected, then something further up the chain likely went wrong and you should be handling that appropriately (exceptions, error codes, logging, etc.). Silently allocating a new pointer will just hide the original source of the error. Whenever there is something wrong in a program, you want to stop or address the problem as close to the source as possible - it makes debugging the problem simpler.
A side note, if you are confident that your pointers are not null and don't want to have to deal with nullptr in a block of code, you may be able to use references instead. For example:
void fun1(MyObject* obj) {}
void fun2(MyObject& obj) {}
In fun1, the code might need to check for nullptr to be well written. In fun2, there is no need to check for nullptr because if someone converts a nullptr to a reference they have already broken the rules. fun2 pushes any responsibility for checking the pointer value higher up the stack. This can be good in some cases (just don't try and store the reference for later). Note that you can use operator * on a shared_ptr/unique_ptr to get a reference directly.

Reasons to return reference to std::unique_ptr

I wonder if there are any legitimate reasons to return unique pointers by reference in C++, i.e. std::unique_ptr<T>&?
I've never actually seen that trick before, but the new project I've got seems to use this pattern a lot. From the first glance, it just effectively breaks / circumvents the "unique ownership" contract, making it impossible to catch the error in compile-time. Consider the following example:
class TmContainer {
public:
TmContainer() {
// Create some sort of complex object on heap and store unique_ptr to it
m_time = std::unique_ptr<tm>(new tm());
// Store something meaningful in its fields
m_time->tm_year = 42;
}
std::unique_ptr<tm>& time() { return m_time; }
private:
std::unique_ptr<tm> m_time;
};
auto one = new TmContainer();
auto& myTime = one->time();
std::cout << myTime->tm_year; // works, outputs 42
delete one;
std::cout << myTime->tm_year; // obviously fails at runtime, as `one` is deleted
Note that if we'd returned just std::unique_ptr<tm> (not a reference), it would raise a clear compile-time error, or would force use to use move semantics:
// Compile-time error
std::unique_ptr<tm> time() { return m_time; }
// Works as expected, but m_time is no longer owned by the container
std::unique_ptr<tm> time() { return std::move(m_time); }
I suspect that a general rule of thumb is that all such cases warrant use of std::shared_ptr. Am I right?
There are two use cases for this and in my opinion it is indicative of bad design. Having a non-const reference means that you can steal the resource or replace it without having to offer separate methods.
// Just create a handle to the managed object
auto& tm_ptr = tm_container.time();
do_something_to_tm(*tm_ptr);
// Steal the resource
std::unique_ptr<TmContainer> other_tm_ptr = std::move(tm_ptr);
// Replace the managed object with another one
tm_ptr = std::make_unique<TmContainer>;
I strongly advocate against these practices because they are error prone and less readable. It's best to offer an interface such as the following, provided you actually need this functionality.
tm& time() { return *m_time; }
std::unique_ptr<tm> release_time() { return {std::move(m_time)}; }
// If tm is cheap to move
void set_time(tm t) { m_time = make_unique<tm>(std::move(t)); }
// If tm is not cheap to move or not moveable at all
void set_time(std::unique_ptr t_ptr) { m_time = std::move(t_ptr); }
This was too long of a comment. I don't have a good idea for requested use case. The only thing I imagine is some middle ware for a utility library.
The question is what do you want and need to model. What are the semantics. Returning reference does not anything useful that I know.
The only advantage in your example is no explicit destructor. If you wanted to make exact mirror of this code it'd be a raw pointer IMHO. What exact type one should use depends on the exact semantics that they want to model.
Perhaps the intention behind the code you show is to return a non-owning pointer, which idiomatically (as I have learned) is modeled through a raw pointer. If you need to guarantee the object is alive, then you should use shared_ptr but bear it mind, that it means sharing ownership - i.e. detaching it's lifetime from the TmContianer.
Using raw pointer would make your code fail the same, but one could argue, that there is no reason for explicit delete as well, and lifetime of the object can be properly managed through scopes.
This is of course debatable, as semantics and meaning of words and phrases is, but my experience says, that's how c++ people write, talk and understand the pointers.
std::unique_ptr does not satisfy the requirements of CopyConstructible or CopyAssignable as per design.
So the object has to be returned as a reference, if needed.

In c++11, is there ever still a need to pass in a reference to an object that will accept the output of a function?

Prior to C++11, if I had a function that operated on large objects, my instinct would be to write functions with this kind of prototype.
void f(A &return_value, A const &parameter_value);
(Here, return_value is just a blank object which will receive the output of the function. A is just some class which is large and expensive to copy.)
In C++11, taking advantage of move semantics, the default recommendation (as I understand it) is the more straightforward:
A f(A const &parameter_value);
Is there ever still a need to do it the old way, passing in an object to hold the return value?
Others have covered the case where A might not have a cheap move constructor. I'm assuming your A does. But there is still one more situation where you might want to pass in an "out" parameter:
If A is some type like vector or string and it is known that the "out" parameter already has resources (such as memory) that can be reused within f, then it makes sense to reuse that resource if you can. For example consider:
void get_info(std::string&);
bool process_info(const std::string&);
void
foo()
{
std::string info;
for (bool not_done = true; not_done;)
{
info.clear();
get_info(info);
not_done = process_info(info);
}
}
vs:
std::string get_info();
bool process_info(const std::string&);
void
foo()
{
for (bool not_done = true; not_done;)
{
std::string info = get_info();
not_done = process_info(info);
}
}
In the first case, capacity will build up in the string as the loop executes, and that capacity is then potentially reused on each iteration of the loop. In the second case a new string is allocated on every iteration (neglecting the small string optimization buffer).
Now this isn't to say that you should never return std::string by value. Just that you should be aware of this issue and apply engineering judgment on a case by case basis.
It is possible for an object to be large and expensive to copy, and for which move semantics cannot improve on copying. Consider:
struct A {
std::array<double,100000> m_data;
};
It may not be a good idea to design your objects this way, but if you have an object of this type for some reason and you want to write a function to fill the data in then you might do it using an out param.
It depends: does your compiler support return-value-optimization, and is your function f designed to be able to use the RVO your compiler supports?
If so, then yes, by all means return by value. You will gain nothing at all by passing a mutable parameter, and you'll gain a great deal of code clarity by doing it this way. If not, then you have to investigate the definition of A.
For some types, a move is nothing more than a copy. If A doesn't contain anything that is actually worth moving (pointers transferring ownership and so forth), then you're not going to gain anything by moving. A move isn't free, after all; it's simply a copy that knows that anything owned by the original is being transferred to the copy. If the type doesn't own anything, then a move is just a copy.

Memory Management with returning char* function

Today, without much thought, I wrote a simple function return to a char* based on a switch statement of given enum values. This, however, made me wonder how I could release that memory. What I did was something like this:
char* func()
{
char* retval = new char[20];
// Switch blah blah - will always return some value other than NULL since default:
return retval;
}
I apologize if this is a naive question, but what is the best way to release the memory seeing as I cannot delete the memory after the return and, obviously, if I delete it before, I won't have a returned value. What I was thinking as a viable solution was something like this
void func(char*& in)
{
// blah blah switch make it do something
}
int main()
{
char* val = new char[20];
func(val);
// Do whatever with func (normally func within a data structure with specific enum set so could run multiple times to change output)
delete [] val;
val = NULL;
return 0;
}
Would anyone have anymore insight on this and/or explanation on which to use?
Regards,
Dennis M.
You could write such functions in pair, like
Xyz* CreateXyz();
void DestroyXyz(Xyz *xyz);
Abc* NewAbc();
void DeleteAbc(Abc *abc);
Or you simply can transfer the responsibilty of deleting Xyz/Abc to the clients, i.e ones who call the function must also do delete on the returned object after using it.
Whatever you choose, make it clear in your documentation how the created object should be destroyed.
I would prefer pair-functions, especially if there is lot of things to consider before deleting!
By the way, you should prefer using std::string, instead of char*. Make use of STL as much as you can. They can solve most of your problems! The above advice is for situations where STL doesn't fit! In general, prefer STL!
Have you considered using an STL type or other class instead of returning a raw pointer? For instance, if your char * is a string, use std::string instead and avoid any risk of leaks:
std::string func()
{
std::string retval("");
// Switch blah blah - will always return some value other than NULL since default:
return retval;
}
If you plan to return raw pointers from a function, you must make really clear on the documentation whose is the responsibility of deleting the pointer, i.e. who owns it. In this case, you should state explicitly that the pointer ownership is transferred to the caller, who is responsible to delete it.
Although many people are OK with just specifying the ownership in the documentation, often it's better to enforce this policy in the code. In particular, smart pointers are often used for this: the current C++ standard provides the std::auto_ptr, a smart pointer which transfers ownership on copy, i.e. when you return it to the caller, you're transferring the ownership to the target std::auto_ptr. Notice that std::auto_ptr automagically deletes the pointed memory on its destruction if it still owns it. The upcoming C++ standard provides std::unique_ptr that works in a similar fashion, but using move semantic.
Unfortunately, std::auto_ptr isn't thought for arrays (which need delete [] instead of delete), so you cannot use for your your purpose. I think that the decision of not including an auto_ptr for arrays was made deliberately, because the STL already provides all the containers you may need if you need to return a collection of items which handle by their own memory management and copy.
In particular, for strings you should simply use std::string and forget completely about this kind of memory management and pointer ownership problems.
In this case whoever calls the func() should release the memory when its not needed. But the correct deletion happens like this:
delete val;
val = NULL;

C++ best practice: Returning reference vs. object

I'm trying to learn C++, and trying to understand returning objects. I seem to see 2 ways of doing this, and need to understand what is the best practice.
Option 1:
QList<Weight *> ret;
Weight *weight = new Weight(cname, "Weight");
ret.append(weight);
ret.append(c);
return &ret;
Option 2:
QList<Weight *> *ret = new QList();
Weight *weight = new Weight(cname, "Weight");
ret->append(weight);
ret->append(c);
return ret;
(of course, I may not understand this yet either).
Which way is considered best-practice, and should be followed?
Option 1 is defective. When you declare an object
QList<Weight *> ret;
it only lives in the local scope. It is destroyed when the function exits. However, you can make this work with
return ret; // no "&"
Now, although ret is destroyed, a copy is made first and passed back to the caller.
This is the generally preferred methodology. In fact, the copy-and-destroy operation (which accomplishes nothing, really) is usually elided, or optimized out and you get a fast, elegant program.
Option 2 works, but then you have a pointer to the heap. One way of looking at C++ is that the purpose of the language is to avoid manual memory management such as that. Sometimes you do want to manage objects on the heap, but option 1 still allows that:
QList<Weight *> *myList = new QList<Weight *>( getWeights() );
where getWeights is your example function. (In this case, you may have to define a copy constructor QList::QList( QList const & ), but like the previous example, it will probably not get called.)
Likewise, you probably should avoid having a list of pointers. The list should store the objects directly. Try using std::list… practice with the language features is more important than practice implementing data structures.
Use the option #1 with a slight change; instead of returning a reference to the locally created object, return its copy.
i.e. return ret;
Most C++ compilers perform Return value optimization (RVO) to optimize away the temporary object created to hold a function's return value.
In general, you should never return a reference or a pointer. Instead, return a copy of the object or return a smart pointer class which owns the object. In general, use static storage allocation unless the size varies at runtime or the lifetime of the object requires that it be allocated using dynamic storage allocation.
As has been pointed out, your example of returning by reference returns a reference to an object that no longer exists (since it has gone out of scope) and hence are invoking undefined behavior. This is the reason you should never return a reference. You should never return a raw pointer, because ownership is unclear.
It should also be noted that returning by value is incredibly cheap due to return-value optimization (RVO), and will soon be even cheaper due to the introduction of rvalue references.
passing & returning references invites responsibilty.! u need to take care that when you modify some values there are no side effects. same in the case of pointers. I reccomend you to retun objects. (BUT IT VERY-MUCH DEPENDS ON WHAT EXACTLY YOU WANT TO DO)
In ur Option 1, you return the address and Thats VERY bad as this could lead to undefined behaviour. (ret will be deallocated, but y'll access ret's address in the called function)
so use return ret;
It's generally bad practice to allocate memory that has to be freed elsewhere. That's one of the reasons we have C++ rather than just C. (But savvy programmers were writing object-oriented code in C long before the Age of Stroustrup.) Well-constructed objects have quick copy and assignment operators (sometimes using reference-counting), and they automatically free up the memory that they "own" when they are freed and their DTOR automatically is called. So you can toss them around cheerfully, rather than using pointers to them.
Therefore, depending on what you want to do, the best practice is very likely "none of the above." Whenever you are tempted to use "new" anywhere other than in a CTOR, think about it. Probably you don't want to use "new" at all. If you do, the resulting pointer should probably be wrapped in some kind of smart pointer. You can go for weeks and months without ever calling "new", because the "new" and "delete" are taken care of in standard classes or class templates like std::list and std::vector.
One exception is when you are using an old fashion library like OpenCV that sometimes requires that you create a new object, and hand off a pointer to it to the system, which takes ownership.
If QList and Weight are properly written to clean up after themselves in their DTORS, what you want is,
QList<Weight> ret();
Weight weight(cname, "Weight");
ret.append(weight);
ret.append(c);
return ret;
As already mentioned, it's better to avoid allocating memory which must be deallocated elsewhere. This is what I prefer doing (...these days):
void someFunc(QList<Weight *>& list){
// ... other code
Weight *weight = new Weight(cname, "Weight");
list.append(weight);
list.append(c);
}
// ... later ...
QList<Weight *> list;
someFunc(list)
Even better -- avoid new completely and using std::vector:
void someFunc(std::vector<Weight>& list){
// ... other code
Weight weight(cname, "Weight");
list.push_back(weight);
list.push_back(c);
}
// ... later ...
std::vector<Weight> list;
someFunc(list);
You can always use a bool or enum if you want to return a status flag.
Based on experience, do not use plain pointers because you can easily forget to add proper destruction mechanisms.
If you want to avoid copying, you can go for implementing the Weight class with copy constructor and copy operator disabled:
class Weight {
protected:
std::string name;
std::string desc;
public:
Weight (std::string n, std::string d)
: name(n), desc(d) {
std::cout << "W c-tor\n";
}
~Weight (void) {
std::cout << "W d-tor\n";
}
// disable them to prevent copying
// and generate error when compiling
Weight(const Weight&);
void operator=(const Weight&);
};
Then, for the class implementing the container, use shared_ptr or unique_ptr to implement the data member:
template <typename T>
class QList {
protected:
std::vector<std::shared_ptr<T>> v;
public:
QList (void) {
std::cout << "Q c-tor\n";
}
~QList (void) {
std::cout << "Q d-tor\n";
}
// disable them to prevent copying
QList(const QList&);
void operator=(const QList&);
void append(T& t) {
v.push_back(std::shared_ptr<T>(&t));
}
};
Your function for adding an element would make use or Return Value Optimization and would not call the copy constructor (which is not defined):
QList<Weight> create (void) {
QList<Weight> ret;
Weight& weight = *(new Weight("cname", "Weight"));
ret.append(weight);
return ret;
}
On adding an element, the let the container take the ownership of the object, so do not deallocate it:
QList<Weight> ql = create();
ql.append(*(new Weight("aname", "Height")));
// this generates segmentation fault because
// the object would be deallocated twice
Weight w("aname", "Height");
ql.append(w);
Or, better, force the user to pass your QList implementation only smart pointers:
void append(std::shared_ptr<T> t) {
v.push_back(t);
}
And outside class QList you'll use it like:
Weight * pw = new Weight("aname", "Height");
ql.append(std::shared_ptr<Weight>(pw));
Using shared_ptr you could also 'take' objects from collection, make copies, remove from collection but use locally - behind the scenes it would be only the same only object.
All of these are valid answers, avoid Pointers, use copy constructors, etc. Unless you need to create a program that needs good performance, in my experience most of the performance related problems are with the copy constructors, and the overhead caused by them. (And smart pointers are not any better on this field, I'd to remove all my boost code and do the manual delete because it was taking too much milliseconds to do its job).
If you're creating a "simple" program (although "simple" means you should go with java or C#) then use copy constructors, avoid pointers and use smart pointers to deallocate the used memory, if you're creating a complex programs or you need a good performance, use pointers all over the place, and avoid copy constructors (if possible), just create your set of rules to delete pointers and use valgrind to detect memory leaks,
Maybe I will get some negative points, but I think you'll need to get the full picture to take your design choices.
I think that saying "if you're returning pointers your design is wrong" is little misleading. The output parameters tends to be confusing because it's not a natural choice for "returning" results.
I know this question is old, but I don't see any other argument pointing out the performance overhead of that design choices.