C++ Linked List remove all - c++

So this is a bit of a conceptual question. I'm writing a LinkedList in C++, and as Java is my first language, I start to write my removeAll function so that it just joins the head an the tail nodes (I'm using sentinel Nodes btw). But I instantly realize that this won't work in C++ because I have to free the memory for the Nodes!
Is there some way around iterating through the entire list, deleting every element manually?

You can make each node own the next one, i.e. be responsible for destroying it when it is destroyed itself. You can do this by using a smart pointer like std::unique_ptr:
struct node {
// blah blah
std::unique_ptr<node> next;
};
Then you can just destroy the first node and all the others will be accounted for: they will all be destroyed in a chain reaction of unique_ptr destructors.
If this is a doubly-linked list, you should not use unique_ptrs in both directions, however. That would make each node own the next one, and be owned by the next one! You should make this ownership relation exist only in one direction. In the other use regular non-owning pointers: node* previous;
However, this will not work as is for the sentinel node: it should not be destroyed. How to handle that depends on how the sentinel node is identified and other properties of the list.
If you can tell the sentinel node apart easily, like, for example, checking a boolean member, you can use a custom deleter that avoids deleting the sentinel:
struct delete_if_not_sentinel {
void operator()(node* ptr) const {
if(!ptr->is_sentinel) delete ptr;
}
};
typedef std::unique_ptr<node, delete_if_not_sentinel> node_handle;
struct node {
// blah blah
node_handle next;
};
This stops the chain reaction at the sentinel.

You could do it like Java if you used a c++ garbage collector. Not many do. In any case, it saves you at most a constant factor in running time, as you spend the cost to allocate each element in the list anyway.

Yes. Well, sort of... If you implement your list to use a memory pool then it is responsible for all data in that pool and the entire list can be deleted by deleting the memory pool (which may contain one or more large chunks of memory).
When you use memory pools, you generally have at least one of the following considerations:
limitations on how your objects are created and destroyed;
limitations on what kind of data you can store;
extra memory requirements on each node (to reference the pool);
a simple, intuitive pool versus a complex, confusing pool.
I am no expert on this. Generally when I've needed fast memory management it's been for memory that is populated once, with no need to maintain free-lists etc. Memory pools are much easier to design and implement when you have specific goals and design constraints. If you want some magic bullet that works for all situations, you're probably out of luck.

Related

Getting into smart pointers, how to deal with representing ownership?

i've made a dynamic graph structure where both nodes and arcs are classes (i mean arcs are an actual instance in memory, they are not implied by an adjacency list of nodes to nodes).
Each node has a list of pointers to the arcs it's connected to.
Each arc has 2 pointers to the 2 nodes it's connecting.
Deleting a node calls delete for each of its arcs.
Each arc delete removes its pointer from the arcs lists in the 2 nodes it connects.
Simplified:
~node()
{
while(arcs_list.size())
{
delete arcs_list[arcs_list.size()-1];
}
}
~arc()
{
node_from.remove_arc(this);
node_to.remove_arc(this);
}
If i want to start using smart pointers here, how do i proceed?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
I was thinking about a shared_ptr, but shared pointer would only delete the arc when both nodes are deleted. If i delete only one node i would still have to explicitly delete all its arcs if i used shared_ptr. And that totally defeats the point of not using raw pointers in the first place.
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
Is there some other kind of smart pointer i should use to handle this?
Or is raw pointer just the plain simple way to go?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
You answered this question yourself:
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
When object A owns object B, then object A can exist after destroying B, but destroying A implies destroying B. Applied to your case, the two nodes share ownership of the arc.
Is there some other kind of smart pointer i should use to handle this? Or is raw pointer just the plain simple way to go?
Ah, yes. That is the real question. There is no pre-made smart pointer for this situation. However, I would not go with raw pointers in your node and/or arc classes. That would mean those classes would need to implement memory management on top of their primary purpose. (Much better to let each class do one thing well, then try to do many things and fail.) I see a few viable options.
1: Write your own smart pointer
Write a class that can encapsulate the necessary destruction logic. The node and/or arc classes would use your new class instead of standard smart pointers (and instead of raw pointers). Take some time to make sure your design decisions are solid. I'm guessing your new class would want a functional/callable of some sort to tell it how to remove itself from the lists it is in. Or maybe shift some data (like the pointers to the nodes) from the arc class to the new class.
I haven't worked out the details, but this would be a reasonable approach since the situation does not fit any of the standard smart pointers. The key point is to not put this logic directly in your node and arc classes.
2: Flag invalid arcs
If your program can stand not immediately releasing memory, you may be able to take a different approach to resolving an arc deletion. Instead of immediately removing an arc from its nodes' lists, simply flag the arc as no longer valid. When a node needs to access its arcs, it (or better yet, its list) would check each arc it accesses – if the arc is invalid, it can be removed from the list at that time. Once the node has been removed from both lists, the normal shared_ptr functionality will kick in to delete the arc object.
The usefulness of this approach decreases the less frequently a node iterates over its arcs. So there is a judgement call to be made.
How would an arc be flagged invalid? The naive approach would be to give it a boolean flag. Set the flag to false in the constructors, and to true when the arc should be considered deleted. Effective, but does require a new field. Can this be done without bloating the arc class? Well, presumably, each arc needs pointers to its nodes. Since the arc does not own its nodes, these are probably weak pointers. So one way to define an arc being invalid is to check if either weak pointer is expired(). (Note that the weak pointers could be manually reset() when the arc is being deleted directly, not via a node's deletion. So an expired weak pointer need not mean the associated node is gone, only that the arc no longer points to it.)
In the case where the arc class is sizeable, you might want to discard most of its memory immediately, leaving just a stub behind. You could add a level of indirection to accomplish this. Essentially, the nodes would share a pointer to a unique pointer, and the unique pointer would point to what you currently call your arc class. When the arc is deleted, the unique pointer is reset(), freeing most of the arc's memory. An arc is invalid when this unique pointer is null. (It looks like Davis Herring's answer is another way to get this effect with less memory overhead, if you can accept an object storing a shared_ptr to itself.)
3: Use Boost.Bimap
If you can use Boost, they have a container that looks like it would solve your problem: Boost.Bimap. But, you ask, didn't I already discount using an adjacency list? Yes, but this Bimap is more than just a way to associate nodes to each other. This container supports having additional information associated with each relation. That is, each relation in the Bimap would represent an arc and it would have an associated object with the arc's information. Seems to fit your situation well, and you would be able to let someone else worry about memory management (always a nice thing, provided you can trust that someone's abilities).
Since nodes can exist alone, they are owned by the graph (which might or might not be a single object), not the arcs (even as shared ownership). The ownership of an arc by its nodes is, as you observed, dual to the usual shared_ptr situation of either owner being sufficient to keep the object alive. You can nonetheless use shared_ptr and weak_ptr here (along with raw, non-owning pointers to the nodes):
struct Node;
struct Arc {
Node *a,*b;
private:
std::shared_ptr<Arc> skyhook{this};
public:
void free() {skyhook.reset();}
};
struct Node {
std::vector<std::weak_ptr<Arc>> arcs;
~Node() {
for(const auto &w : arcs)
if(const auto a=w.lock()) a->free();
}
};
Obviously other Node operations have to check for empty weak pointers and perhaps clean them out periodically.
Note that exception safety (including vs. bad_alloc in constructing the shared_ptr) requires more care in constructing an Arc.

What is the best way to put large objects on the heap?

I am working on a project that needs to load many objects from a data file and store them in memory. Since I have been told that stack space is rare and larger amounts of data should be on the heap I put everything on the heap. However, my impression is that I overdid it a little bit.
My current design looks like this:
class RoadMap
{
unique_ptr<set<unique_ptr<Node>>> allNodes;
void addNode(unique_ptr<Node> node)
{
this->allNodes->insert(std::move(node));
}
}
int main()
{
unique_ptr<RoadMap> map(new RoadMap());
// open file etc.
for (auto nodeData : nodesInFile)
{
map->addNode(unique_ptr<Node>(new Node(nodeData)));
}
}
From what I understand by now, this creates a lot of overhead because there are many unique pointers involved that I think I do not need. If I understand correctly, it should be sufficient to only have one unique pointer barrier in the "pointer chain". However, I am unsure what the best practice is to do this.
Option 1
class RoadMap
{
unique_ptr<set<Node>> allNodes;
void addNode (Node node)
{
this->allNodes->insert(node);
}
}
int main()
{
RoadMap map;
//open file etc.
for (auto nodeData : nodesInFile)
{
map.addNode(Node(nodeData));
}
}
The advantage of this seems to me that the RoadMap class itself is the only one that needs to take care of heap allocation and does so only once when creating the set.
Option 2
class RoadMap
{
set<Node> allNodes;
void addNode (Node node)
{
this->allNodes.insert(node);
}
}
int main()
{
unique_ptr<RoadMap> map(new RoadMap());
// open file etc.
for (auto nodeData : nodesInFile)
{
map->addNode(Node(nodeData));
}
}
Here the unique pointer is only in the main function meaning that the users of the RoadMap class will need to know that this object can become quite large and should be put on the stack. I don't think that this is an overly nice solution.
Option 3
class RoadMap
{
set<unique_ptr<Node>> allNodes;
void addNode(unique_ptr<Node> node)
{
this->allNodes.insert(std::move(node));
{
}
int main()
{
RoadMap map;
// open file etc.
for (auto nodeData : nodesInFile)
{
map.addNode(unique_ptr<Node>(new Node(nodeData)));
}
}
This solution uses many unique pointers which means that when deleting the RoadMap many destructors and deletes will need to be called. Also the RoadMap caller has to supply a unique_ptr when adding a node meaning that he has to do the heap allocation himself.
Right now, I am favouring option 1 over the others. However, I have only been coding C++ for a comparatively short time and am unsure whether I fully understand the concepts behind memory management which is why I want you to (in)validate my opinion. Am I correct in assuming that option 1 is the best way to do this? Do you have any additional references to best practices for this sort of thing?
Give Node a move constructor and move assignment operator (to make operations on the set cheap), then use a mix of option 1 and 2. std::set will already be heap allocating its contents so you don't need to worry about allocating a RoadMap on the heap. Note the extra std::move inside addNode to allow Nodes to be moved into the set.
class RoadMap
{
set<Node> allNodes;
void addNode (Node node)
{
allNodes.emplace(std::move(node));
}
};
int main()
{
RoadMap map;
// open file etc.
for (const auto& nodeData : nodesInFile)
{
map.addNode(Node(nodeData));
}
}
Each of them are quite different from each other.
I would suggest option 2 for simplicity. But it might be more performance intensive in some operations like sort etc because you would be moving the entire Node and not a pointer to it.
I assume that is not a problem, since you are using set. You can still optimize this by using move semantics on your Node object. With out this you are still using 1 copy per add.
The above issue I mention might have been a problem with vector. Another issue you would have with storing the objects directly is the lack of polymorphism. You cant store subtypes of Node, they would get sliced.
If this is an issue I would suggest option 2. Storing pointers means that moving them is faster, and Polymorphism works.
I see no reason for Option 1 or your original solution.
p.s. the this-> in your code is unnecessary.
p.p.s As DyP points out set uses heap anyway, which is what makes Option 2 good. Clue - Stack based structures cannot grow. => Only std::array is I believe stored on stack.
Let me talk a little about the meta problem: You don't want the stack to overflow and hence put your data structures on the heap. That's the right thing to do. But the important thing to understand here is when things will be put onto the heap.
Every local variable is allocated on the stack. If you have data structures of dynamic size, then they refer to the heap in (allmost) all cases. (The only exception I know is when you reserve memory on the stack on purpose with alloca() or std::get_temporary_buffer() or something like it). In particular all STL containers keep their memory on the heap and hardly any stack memory for local variables or member variables is used (except std::array whose size is known at compile-time).
Hence wrapping dynamically sized data structures into unique_ptrs has very little effect, if you want to save stack memory, but it adds indirection to your program which complicates your code, slows down execution and increases heap memory usage unnecessarily.
Here's an example: On Visual Studio 2010 with 32-bit compilation an std::set will use 20 bytes of memory on the stack independent of the template type parameter and of the actual number elements contained in the set. The memory for the set elements is on the heap.
I believe, that you can now make your own decision on whether to use unique_ptrs for the purpose you intent.
Basically it also depends how you want to access the stored Node instances inside your RoadMap instance. I assume your Node instance will release the wrapped note data.
I would go for an adjusted version 2.

Implementing a list with unique_ptr<>?

As I understand it, a unique_ptr signifies exclusive ownership. A singly linked list seems to fit this, with each node owning the next, like (pseduocode alert)
class node{
public:
unique_ptr<node> next;
int value;
};
but I don't understand how to perform operations like traversing the list, where I'm used to doing
here=here->next;
How do you implement data structures using unique_ptr's? Are they the right tool for the job?
When you go through the nodes, you don't need to own the node pointer, which means that
here=here->next;
Is incorrect if here is a unique_ptr.
Owning an object means "being responsible for it's life and death" which means the owner is the one who have the code that will destroy the object. If you use another definition of owning, then it's not what unique_ptr means.
In you're list node code, you assume that each node is responsible for the next node (if you destroy a node, all the next nodes will be destroyed too). It can be valid behaviour, it depends on your needs, just be sure it's what you really wants.
What you want is to read the pointer without owning it. Current good practice to do this is to use a raw pointer indicating a "use but don't own" kind of usage to other developers looking at this code (unique_ptr means "if I die, the pointed object dies too"):
node* here = nullptr; // it will not own the pointed nodes (don't call delete with this pointer)
here = &first_node(); // assuming first_node() returns a reference to the first node
here = here->next.get(); // to get the next node without owning it: use get() - true in all smart pointers interface

C++ vector of pointers problem

I'm currently trying to implement the A* pathfinding algorithm using C++.
I'm having some problems with pointers... I usually find a way to avoid using them but now I guess I have to use them.
So let's say I have a "node" class(not related to A*) implemented like this:
class Node
{
public:
int x;
Node *parent;
Node(int _x, Node *_parent)
: x(_x), parent(_parent)
{ }
bool operator==(const Node &rhs)
{
return x == rhs.x && parent == rhs.parent;
}
};
It has a value (in this case, int x) and a parent (a pointer to another node) used to navigate through nodes with the parent pointers.
Now, I want to have a list of nodes which contains all the nodes that have been or are being considered. It would look like this:
std::vector<Node> nodes;
I want a list that contains pointers pointing to nodes inside the nodes list.
Declared like this:
std::vector<Node*> list;
However, I'm definitely not understanding pointers properly because my code won't work.
Here's the code I'm talking about:
std::vector<Node> nodes;//nodes that have been considered
std::vector<Node*> list;//pointers to nodes insided the nodes list.
Node node1(1, NULL);//create a node with a x value of 1 and no parent
Node node2(2, &node1);//create a node with a x value of 2 and node1 being its parent
nodes.push_back(node1);
list.push_back(&nodes[0]);
//so far it works
//as soon as I add node2 to nodes, the pointer in "list" points to an object with
//strange data, with a x value of -17891602 and a parent 0xfeeefeee
nodes.push_back(node2);
list.push_back(&nodes[1]);
There is clearly undefined behaviour going on, but I can't manage to see where.
Could somebody please show me where my lack of understanding of pointers breaks this code and why?
So, the first issue that you have here is that you are using the address of individual Nodes of one of your vectors. But, over time, as you add more Node objects to your vector, those pointers may become invalid, because the vector may move the Nodes.
(The vector starts out at a certain pre-allocated size, and when you fill it up, it allocates a new, larger storage area and moves all of the elements to the new location. I'm betting that in your case, as soon as you add the second Node to nodes, it is doing this move.)
Is there a reason why you can't store the indices instead of the raw pointers?
One problem is that push_back can force a reallocation of the vector, i.e. it creates a larger block of memory, copies all existing elements to that larger block, and then deletes the old block. That invalidates any pointers you have to elements in the vector.
The problem is that, every time you add to a vector, it might need to expand its internal memory. If it does so, it allocates a new piece of storage, copies everything over, and deletes the old one, invalidating iterators and pointers to all of its objects.
As solution to your problem you could either
avoid reallocation by reserving enough space upfront (nodes.reserve(42))
turn nodes into a std::list (which doesn't invalidate iterators or pointers to elements not directly affected by changes)
store indexes instead of pointers.
Besides your problem, but still worth mentioning:
The legal use of identifiers starting with underlines is rather limited. Yours is legal, but if you don't know the exact rules, you might want to avoid using them.
Your comparison operator doesn't tell that it won't change its left argument. Also, operators treating their operands equally (i.e. not modifying them, as opposed to, say, +=), are usually best implemented as free functions, rather than as member functions.
just adding to the existing answers; instead of the raw pointers, consider using some form of smart pointer, for example, if boost is available, consider shared_ptr.
std::vector<boost::shared_ptr<Node> > nodes;
and
std::list<boost::shared_ptr<Node> > list;
Hence, you only need to create a single instance of Node, and it is "managed" for you. Inside the Node class, you have the option of a shared_ptr for parent (if you want to ensure that the parent Node does not get cleaned up till all child nodes are removed, or you can make that a weak_ptr.
Using shared pointers may also help alleviate problems where you want to store "handles" in multiple containers (i.e. you don't necessarily need to worry about ownership - as long as all references are removed, then the object will get cleaned up).
Your code looks fine to me, but remember that when nodes goes out of scope, list becomes invalid.

Are data structures an appropriate place for shared_ptr?

I'm in the process of implementing a binary tree in C++. Traditionally, I'd have a pointer to left and a pointer to right, but manual memory management typically ends in tears. Which leads me to my question...
Are data structures an appropriate place to use shared_ptr?
I think it depends on where you'd be using them. I'm assuming that what you're thinking of doing is something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
shared_ptr<BinaryTreeNode<T> > left;
shared_ptr<BinaryTreeNode<T> > right;
T data;
}
This would make perfect sense if you're expecting your data structure to handle dynamically created nodes. However, since that's not the normal design, I think it's inappropriate.
My answer would be that no, it's not an appropriate place to use shared_ptr, as the use of shared_ptr implies that the object is actually shared - however, a node in a binary tree is not ever shared. However, as Martin York pointed out, why reinvent the wheel - there's already a smart pointer type that does what we're trying to do - auto_ptr. So go with something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
auto_ptr<BinaryTreeNode<T> > left;
auto_ptr<BinaryTreeNode<T> > right;
T data;
}
If anyone asks why data isn't a shared_ptr, the answer is simple - if copies of the data are good for the client of the library, they pass in the data item, and the tree node makes a copy. If the client decides that copies are a bad idea, then the client code can pass in a shared_ptr, which the tree node can safely copy.
Because left and right are not shared boost::shared_ptr<> is probably not the correct smart pointer.
This would be a good place to try std::auto_ptr<>
Yes, absolutely.
But be careful if you have a circular data structure. If you have two objects, both with a shared ptr to each other, then they will never be freed without manually clearing the shared ptr. The weak ptr can be used in this case. This, of course, isn't a worry with a binary tree.
Writing memory management manually is not so difficult on those happy occasions where each object has a single owner, which can therefore delete what it owns in its destructor.
Given that a tree by definition consists of nodes which each have a single parent, and therefore an obvious candidate for their single owner, this is just such a happy occasion. Congratulations!
I think it would be well worth* developing such a solution in your case, AND also trying the shared_ptr approach, hiding the differences entirely behind an identical interface, so you switch between the two and compare the difference in performance with some realistic experiments. That's the only sure way to know whether shared_ptr is suitable for your application.
(* for us, if you tell us how it goes.)
Never use shared_ptr for the the nodes of a data structure. It can cause the destruction of the node to be suspended or delayed if at any point the ownership was shared. This can cause destructors to be called in the wrong sequence.
It is a good practice in data structures for the constructors of nodes to contain any code that couples with other nodes and the destructors to contain code that de-couples from other nodes. Destructors called in the wrong sequence can break this design.
There is a bit of extra overhead with a shared_ptr, notably in space requirements, but if your elements are individually allocated then shared_ptr would be perfect.
Do you even need pointers? It seems you could use boost::optional<BinaryTreeNode<T> > left, right.