Implementing Containers using Smart Pointers

Implementing Containers using Smart Pointers - c++

Ok, so everyone knows that raw pointers should be avoided like the plague and to prefer smart pointers, but does this advice apply when implementing a container? This is what I am trying to accomplish:
template<typename T> class AVLTreeNode {
public:
T data;
unique_ptr<AVLTreeNode<T>> left, right;
int height;
}
Unique_ptr can make container functions more cumbersome to write because I can't have multiple raw pointers temporarily pointing to the same object in a way that is elegant. For example:
unique_ptr<AVLTreeNode<T>> rotate_right(unique_ptr<AVLTreeNode<T>> n1)
{
unique_ptr<AVLTreeNode<T>> n2 = n1->left;
n1->left = n2->right;
n2->right = n1;
// n1 must now be referenced through the longer name n2->right from now on
n2->right->recalculate_height();
n2->recalculate_height();
return n2;
}
(It's not a big deal in this example but I can imagine how it could become a problem). Should I take problems like these as a strong hint that containers should be implemented with good old new, delete, and raw pointers? It seems like awfully a lot of trouble just to avoid writing a destructor.

I do not usually use smart pointers when implementing containers as you show. Raw pointers (imho) are not to be avoided like the plague. Use a smart pointer when you want to enforce memory ownership. But typically in a container, the container owns the memory pointed to by the pointers making up the data structure.
If in your design, an AVLTreeNode uniquely owns its left and right children and you want to express that with unique_ptr, that's fine. But if you would prefer that AVLTree owns all AVLTreeNodes, and does so with raw pointers, that is just as valid (and is the way I usually code it).
Trust me, I'm not anti-smart-pointer. I am the one who invented unique_ptr. But unique_ptr is just another tool in the tool box. Having good smart pointers in the tool box is not a cure-all, and using them blindly for everything is not a substitute for careful design.
Update to respond to comment (comment box was too small):
I use raw pointers a lot (which are rarely owning). A good sampling of my coding style exists in the open source project libc++. One can browse the source under the "Browse SVN" link.
I prefer that every allocation of a resource be deallocate-able in a destructor somewhere, because of exception safety concerns, even if the usual deallocation happens outside of a destructor. When the allocation is owned by a single pointer, a smart pointer is typically the most convenient tool in the tool box. When the allocation is owned by something larger than a pointer (e.g. a container, or a class Employee), raw pointers are often a convenient part of the data structure composing the larger object.
The most important thing is that I never allocate any resource without knowing what object owns that resource, be it smart pointer, container, or whatever.

The code you presented compiles with no problems
#include <memory>
template<typename T> class AVLTreeNode {
public:
T data;
std::unique_ptr<AVLTreeNode<T>> left, right;
int height;
};
int main()
{
AVLTreeNode<int> node;
}
test compilation: https://ideone.com/aUAHs
Personally, I've been using smart pointers for trees even when the only thing we had was std::auto_ptr
As for rotate_right, it could be implemented with a couple calls to unique_ptr::swap

Small correction: raw pointers should not be avoided like the plague (oops, not everybody knew the fact), but manual memory management should be avoided when possible (by using containers instead of dynamic array or smartpointers), so in your function, just do a get() on your unique_ptr for temporary storage.

std::shared_ptr does not have these restrictions. Especially, multiple shared_ptr-instances can reference the same object.

Herb Shutter has very clear guideline about not using shared_ptr as parameters in his GoTW series:
Guideline: Don’t pass a smart pointer as a function parameter unless
you want to use or manipulate the smart pointer itself, such as to
share or transfer ownership.
and this...
Guideline: Prefer passing objects by value, *, or &, not by smart
pointer.

Related

Smart pointer concepts ownership and lifetime

There are two concepts (ownership, lifetime) that are important when using C++ smart pointers (unique, shared, weak). I try to understand those concepts and how they influence smart pointer (or raw pointer) usage.
I read two rules:
Always use smart pointers to manage ownership/lifetime of dynamic
objects.
Don't use smart pointers when not managing ownership/lifetime.
An example:
class Object
{
public:
Object* child(int i) { return mChildren[i]; }
// More search and access functions returning pointers here
private:
vector<Object*> mChildren;
};
I want to rewrite this using smart pointers. Lets ignore child() first. Easy game. A parent owns its children. So make mChildren a vector of unique_ptr.
According to the above rules, some people argue child(i) should continue returning a raw pointer.
But isn't this risky? Someone could do stupid things like deleting the returned object getting a hard to debug crash... which could be avoided using a weak_ptr or a shared_ptr as a return value.
Can't one say that copying a pointer always means to temporarily share the ownership and/or to assert the lifetime of the object?
Is it worth using smart pointers for children only if I do not get a safer API as well?

You could return a const std::unique_ptr<Object>& which would allow you to have same semantics of a raw pointer to call methods on it while preventing deletion.
Using std::unique_ptr with raw pointer makes sense when you know that the ownership will survive any raw pointer and you are sure that people won't try to delete the pointer directly. So that's different from using a std::weak_ptr and std::shared_ptr because they won't allow you to use dangling pointers at all.
There's always room to make something wrong, so the answer really depends on the specific situation, where this code is going to be used and such.

Writing more general pointer code

Assume that I want to write function that takes in a pointer. However I want to allow caller to use naked pointers or smart pointers - whatever they prefer. This should be good because my code should rely on pointer semantics, not how pointers are actually implemented. This is one way to do this:
template<typename MyPtr>
void doSomething(MyPtr p)
{
//store pointer for later use
this->var1 = p;
//do something here
}
Above will use duck typing and one can pass naked pointers or smart pointers. The problem occurs when passed value is base pointer and we need to see if we can cast to derived type.
template<typename BasePtr, typename DerivedPtr>
void doSomething(BasePtr b)
{
auto d = dynamic_cast<DerivedPtr>(b);
if (d) {
this->var1 = d;
//do some more things here
}
}
Above code will work for raw pointers but won't work for the smart pointers because I need to use dynamic_pointer_cast instead of dynamic_cast.
One solution to above problem is that I add new utility method, something like, universal_dynamic_cast that works both on raw pointers and smart pointers by selecting overloaded version using std::enable_if.
The questions I have are,
Is there a value in adding all these complexities so code supports raw as well as smart pointers? Or should we just use shared_ptr in our library public APIs? I know this depends on purpose of library, but what is the general feeling about using shared_ptr all over API signatures? Assume that we only have to support C++11.
Why doesn't STL has built-in pointer casts that are agnostic of whether you pass raw pointers or smart pointers? Is this intentional from STL designers or just oversight?
One other problem in above approach is loss of intellisense and bit of readability. This is the problem obviously in all duck typed code. In C++, however, we have a choice. I could have easily strongly typed my argument above like shared_ptr<MyBase> which would sacrifice flexibility for callers to pass whatever wrapped in whatever pointer but reader of my code would be more confident and can build better model on on what should be coming in. In C++ public library APIs, are there general preferences/advantages one way or another?
There is one more approach I have seen in other SO answer where the author proposed that you should just use template<typename T> and let caller decide if T is some pointer type or reference or class. This super generic approach obviously don't work if I have to call something in T because C++ requires dereferencing pointer types which means I have to probably create utility method like universal_deref using std::enable_if that applies * operator to pointer types but does nothing for plain objects. I wonder if there are any design patterns that allows this super generic approach more easily. Again, above all, is it worth going all these troubles or just keep thing simple and use shared_ptr everywhere?

To store a shared_ptr within a class has a semantic meaning. It means that the class is now claiming ownership of that object: the responsibility for its destruction. In the case of shared_ptr, you are potentially sharing that responsibility with other code.
To store a naked T*... well, that has no clear meaning. The Core C++ Guidelines tell us that naked pointers should not be used to represent object ownership, but other people do different things.
Under the core guidelines, what you are talking about is a function that may or may not claim ownership of an object, based on how the user calls it. I would say that you have a very confused interface. Ownership semantics are usually part of the fundamental structure of code. A function either takes ownership or it does not; it's not something that gets determined based on where it gets called.
However, there are times (typically for optimization reasons) where you might need this. Where you might have an object that in one instance is given ownership of memory and in another instance is not. This typically crops up with strings, where some users will allocate a string that you should clean up, and other users will get the string from static data (like a literal), so you don't clean it up.
In those cases, I would say that you should develop a smart pointer type which has this specific semantics. It can be constructed from a shared_ptr<T> or a T*. Internally, it would probably use a variant<shared_ptr<T>, T*> or a similar type if you don't have access to variant.
Then you could give it its own dynamic/static/reinterpret/const_pointer_cast functions, which would forward the operation as needed, based on the status of the internal variant.
Alternatively, shared_ptr instances can be given a deleter object that does nothing. So if your interface just uses shared_ptr, the user can choose to pass an object that it technically does not truly own.

The usual solution is
template<typename T>
void doSomething(T& p)
{
//store reference for later use
this->var1 = &p;
}
This decouples the type I use internally from the representation used by the caller. Yes, there's a lifetime issue, but that's unavoidable. I cannot enforce a lifetime policy on my caller and at the same time accept any pointer. If I want to ensure the object stays alive, I must change the interface to std::shared_ptr<T>.

I think the solution you want is to force callers of your function to pass a regular pointer rather than using a template function. Using shared_ptrs is a good practice, but provides no benefit in passing along the stack, since the object is already held in a shared pointer by the caller of your function, guaranteeing it does not get destroyed, and your function isn't really "holding on" to the object. Use shared_ptrs when storing as a member (or when instantiating the object that will become stored in a member), but not when passing as an argument. It should be a simple matter for the caller to get a raw pointer from the shared_ptr anyway.

The purpose of smart pointers
The purpose of smart pointers is to manage memory resources. When you have a smart pointer, then you usually claim unique or shared ownership. On the other hand, raw pointers just point to some memory that is managed by someone else. Having a raw pointer as a function parameter basically tells the caller of the function that the function is not caring about the memory management. It can be stack memory or heap memory. It does not matter. It only needs to outlive the lifetime of the function call.
Semantics of pointer parameters
When passing a unique_ptr to a function (by value), then your passing the responsibility to clean up memory to that function. When passing a shared_ptr or weak_ptr to a function, then that's saying "I'll possibly share memory ownership with that function or object it belongs to". That's quite different from passing a raw pointer, which implicitly mean "Here's a pointer. You can access it until you return (unless specified otherwise)".
Conclusion
If you have a function, then you usually know which kind of ownership semantics you have and 98% of the time you don't care about ownership and should just stick to raw pointers or even just references, if you know that the pointer you're passing is not a nullptr anyways. Callers that have smart pointers can use the p.get() member function or &*p, if they want to be more terse. Therefore, I would not recommend to template code to tackle your problem, since raw pointers give the caller all the flexibility you can get. Avoiding templates also allows you to put your implementation into an implementation file (and not into a header file).
To answer your concrete questions:
I don't see much value in adding this complexity. To the contrary: It complicates your code unnecessarily.
There is hardly any need for this. Even if you use std::dynamic_pointer_cast in the such, it is to maintain ownership in some way. However, adequate uses of this are rare, because most of the time just using dynamic_cast<U*>(ptr.get()) is all you need. That way you avoid the overhead of shared ownership management.
My preference would be: Use raw pointers. You get all the flexibility, intellisense and so forth and you will live happily ever after.
I would rather call this an antipattern - a pattern that should not be used. If you want to be generic, then use raw pointers (if they are nullable) or references, if the pointer parameter would never be a nullptr. This gives the caller all the flexibility while keeping the interface clean and simple.
Further reading: Herb Sutter talked about smart pointers as function parameters in his Guru of the Week #91. He explains the topic in depth there. Especially point 3 might be interesting to you.

After reviewing some more material, I've finally decided to use plain old raw pointers in my public interface. Here is the reasoning:
We shouldn't be designing interface to accommodate bad design decisions of others. The mantra of "avoid raw pointers like a plague and replace them with smart pointers everywhere" is just bad advice (also se Shutter's GoTW). Trying to support those bad decisions spreads them in to your own code.
Raw pointers explicitly sets up contract with callers that they are the one who need to worry about lifetime of inputs.
Raw pointers gives the maximum flexibility to callers who have shared_ptr, unique_ptr or just raw pointers.
Code now looks much more readable, intuitive and reasonable unlike those duck typed templates taking over everywhere.
I get my strong typing back along with intellisense and better compile time checks.
Casting up and down hierarchy is a breeze and don't have to worry about perf implications where new instance of smart pointer may get created at each cast.
While passing pointers around internally, I don't have to carefully care if the pointer would be shared_ptr or raw pointer.
Although I don't care about it, there is better pathway to support older compilers.
In short, trying to accommodate potential clients who have taken up on guidelines of never using raw pointers and replace them with smart pointers everywhere causes polluting my code with unnecessary complexity. So keep simple things simple and just use raw pointers unless you explicitly want ownership.

Linked list implementation using Templates in C++

I am new to C++ and have been studying data structures lately. I have created linked lists in the following way:
class Random{
private:
struct Node{
int data;
Node* next;
};
}
But I came across a piece of code that is doing the same thing in the following way:
template<Typename T>
struct Listnode {
T data;
shared_ptr<ListNode<T>> next;
};
I looked it up and found that we use templates when we want to have multiple data types. Like now we can use int, double, float instead of "T". Whereas in the former case we could only use int. However, I don't understand how:
Node* next
is the same as:
shared_ptr<ListNode<T>> next
and how will these be called, I know for the former we use the:
Node->next = new Node;
Node->data = randomdata;
How does it work for the former way. Another thing of the two implementations, which one is better and why?

The T* ptr; form is the basic method declaring a pointer to a memory holding a value of type T. This kind of pointer is initialized either by the base adress of an array T[], by new T(), by new T[] or something else.
As you can see by now there are many ways to allocate the memory the pointer is pointing to. This is one of the pitfalls when it comes to freeing the memory used. should you use delete, delete[], or are we pointing to a memory not even allocated by us?
What if we forget to free the allocated memory, or try to access memory already freed?
=> with raw pointers, bugs can be occur easily!
Here smartpointers come to the rescue! Smartpointers like std::unique_ptr,
and std::shared_ptr encapsulate these raw pointers for us and handle typesafe memory management. Thus when going out of scope, the memory in a unique_ptr is automatically freed. The same is valid for shared_ptr if no references to it exists.
I would always recommend to use c++'s smart pointers where possible!
Which kind of smart pointer you should use depends on the kind of linked list you want to implement (e.g. if circular lists are supported too).
btw. have you thought about std::vector or std::list?

The second form is a type of "smart" pointer. Most code using modern c++ should be using them.
Using raw (non smart) pointers you have to remember to do the pairing of new/delete or new[]/delete[] when the object goes out of scope. In the simplistic case of a constructor/destructor its not that much of a burden. But, when you are using pointers in a function and that function throws an exception, it gets a bit tricky to free things up.
There are more than one type of smart pointer. unique, shared and weak. Uniques are for one off object that are only used in one place (like an object or function). Shared are for cases where multiple objects are using the same pointer/resource and you only want to call the allocated object's destructor when the last owner of the pointer goes out of scope. Weaks are for cases where the resource is managed by someone else and the pointed to resource should live on after the object with the weak pointer goes out of scope (they are also needed for avoiding cyclical allocations that prevent GC and cause memory leaks).
Smart pointers are a good thing, you should read up on them (Stroustrups book is great). There are few cases now were naked pointers are needed.

As Karoly Horvath said, it's not the same thing:
T* is a "plain" pointer to objects of type T, it stores an address in memory, and implicitly the type of the object that we can expect to find at this address (which is useful to know e.g. the size of the target memory).
std::shared_ptr<T> is an object that belongs to the category of "smart-pointers", which are called "smart" because they can manage the memory that is being pointed, by keeping track of how many references exist to that memory location. This means in practical terms that dynamically allocated memory will be released for you when it is no longer used by your code at runtime.
I would say that for a simple linked-list (singly or doubly linked), there is no need to use shared_ptrs. It might be useful e.g. for graphs with dynamic recurrent structure. For genericity sake though, it is better to use a templated version of your node:
template <typename T>
struct ListNode
{
T data;
ListNode<T> *next;
};

First lets get some junk out of the way:
It is good to learn and understand the raw implementations of linked lists as you will encounter them (for many good or bad reasons) in production code.
IF you ~have~ to use 'invasive' linked lists in your code, templates and 'smarter pointers' are going to save you headaches. (IMO).
You are almost always served better by collection classes/templates.
With the caveats above:
The std::shared_ptr is a 'smarter pointer' that wraps a raw pointer (typically produced by calling operator new) and adds RAII style semantics to the mix along with a contract that allows multiple holders of the underlying object to reference the content without it going away. (When used properly.)
A linked list is 'just' a convention for a program to follow (hopefully, but not required to be) homogeneous types without moving the data around. With ('old school', not bad) linked lists ALL of the link management is your responsibility. It is extra easy to either forget, or get distracted and 'forget' to free resources. This is a cause of many nights of debugging and pesky things called 'memory leaks'.
The hybrid 'template link list' is 'better' in that the resource management responsibility is reduced. It is NOT eliminated. It will help reduce a type of memory leak that is forgetting to delete an allocated node. It will NOT eliminate 'memory leaks' that are caused by circular references. (This is case where an 'external actor' is required to break the circular chain and can be extraordinarily complex in 'real' code.)
std::shared_ptr has operators defined that allow you to 'pretend' that you are interacting with WHAT the std::shared pointer is managing. In that way, visually the code looks mostly the same, but the class is(/may be) hiding a little complexity on your behalf. (Pointer checking, etc.)
Which is better? IMO neither. However given a choice between ONLY those two, I would absolutely prefer the 'smarter pointer' version over the 'manual do it your self' style in the VAST majority of cases.
If it was ME I would pick another container. However, if your intent is to learn about the fundamentals of how those containers are implemented (a good thing!) this isn't exactly the answer you want to hear. :-)

Accelerated C++: Can I substitute raw pointers for smart pointers?

I love this book, sadly it does not cover smart pointers as they were not part of the standard back then. So when reading the book can I fairly substitute every mentioned pointer by a smart pointer, respectively reference?

"Smart Pointer" is a bit of a misnomer. The "smart" part is that they will do some things for you, whether or not you need, want, or even understand what those things are. And that's really important. Because sometimes you'll want to go to the store, and smart pointers will drive you to church. Smart pointers solve some very specific problems. Many would argue that if you think you need smart pointers, then you're probably solving the wrong problem. I personally try not to take sides. Instead, I use a toolbox metaphor - you need to really understand the problem you're solving, and the tools that you have at your disposal. Only then can you remotely expect to select the right tool for the job. Best of luck, and keep questioning!

Well, there are different kinds of smart pointers. For example:
You could create a scoped_ptr class, which would be useful when you're allocating for a task within a block of code, and you want the resource to be freed automatically when it runs of of scope.
Something like:
template <typename T>
class scoped_ptr
{
public:
scoped_ptr(T* p = 0) : mPtr(p) {}
~scoped_ptr() { delete mPtr; }
//...
};
Additionally you could create a shared_ptr who acts the same but keeps a ref count. Once the ref count reach 0 you deallocate.
shared_ptr would be useful for pointers stored in STL containers and the like.
So yes, you could use smart pointers for most of the purposes of your program.
But think judiciously about what kind of smart pointer you need and why.
Do not simply "find and replace" all the pointers you come across.

No.
Pointers which represent object ownership should be replaced by smart pointers.
Other pointers should be replaced by iterators (which in the simplest case is just a typedef for a raw pointer, but no one would think they need to delete).
And of course, the implementation code for smart pointers and iterators will continue to need raw pointers.

Smart pointers - cases where they cannot replace raw pointers

HI,
I have this query about smart pointers.
I heard from one of my friends that smart pointers can almost always replace raw pointers.
but when i asked him what are the other cases where smart pointers cannot replace the raw pointers,i did not get the answer from him.
could anybody please tell me when and where they cannot replace raw pointers?

Passing pointers to legacy APIs.
Back-references in a reference-counted tree structure (or any cyclic situation, for that matter). This one is debatable, since you could use weak-refs.
Iterating over an array.
There are also many cases where you could use smart pointers but may not want to, e.g.:
Some small programs are designed to leak everything, because it just isn't worth the added complexity of figuring out how to clean up after yourself.
Fine-grained batch algorithms such as parsers might allocate from a pre-allocated memory pool, and then just blow away the whole pool on completion. Having smart pointers into such a pool is usually pointless.

An API that is going to be called from C, would be an obvious example.

Depends on the smart pointer you use. std::auto_ptr is not compatible with STL containers.

It's a matter of semantics:
smart pointer: you own (at least partly) the memory being pointed to, and as such are responsible for releasing it
regular pointer: you are being given a handle to an object... or not (NULL)
For example:
class FooContainer
{
public:
typedef std::vector<Foo> foos_t;
foos_t::const_iterator fooById(int id) const; // natural right ?
};
But you expose some implementation detail here, you could perfectly create your own iterator class... but iterator usually means incrementable etc... or use a pointer
class FooContainer
{
public:
const Foo* fooById(int id) const;
};
Possibly it will return NULL, which indicates a failure, or it will return a pointer to an object, for which you don't have to handle the memory.
Of course, you could also use a weak_ptr here (you get the expired method), however that would require using shared_ptr in the first place and you might not use them in your implementation.

interaction with legacy code. if the api needs a raw pointer you need to provide a raw pointer even if once its in your code you wrap it in a smart pointer.

If you have a situation where a raw pointer is cast to an intptr_t and back for some reason, it cannot be replaced by a smart pointer because the casting operation would lose any reference counting information contained in the smart pointer.

It would be quite hard to implement smart pointers if at some point you don't use plain pointers.
I suppose it would also be harder to implement certain data structures with smart pointers. E.g freeing the memory of a regular linked list is quite trivial, but it would take some thought to figure out the combination of owning and non-owning smart pointers to get the same result.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js