I'm currently trying to implement the A* pathfinding algorithm using C++.
I'm having some problems with pointers... I usually find a way to avoid using them but now I guess I have to use them.
So let's say I have a "node" class(not related to A*) implemented like this:
class Node
{
public:
int x;
Node *parent;
Node(int _x, Node *_parent)
: x(_x), parent(_parent)
{ }
bool operator==(const Node &rhs)
{
return x == rhs.x && parent == rhs.parent;
}
};
It has a value (in this case, int x) and a parent (a pointer to another node) used to navigate through nodes with the parent pointers.
Now, I want to have a list of nodes which contains all the nodes that have been or are being considered. It would look like this:
std::vector<Node> nodes;
I want a list that contains pointers pointing to nodes inside the nodes list.
Declared like this:
std::vector<Node*> list;
However, I'm definitely not understanding pointers properly because my code won't work.
Here's the code I'm talking about:
std::vector<Node> nodes;//nodes that have been considered
std::vector<Node*> list;//pointers to nodes insided the nodes list.
Node node1(1, NULL);//create a node with a x value of 1 and no parent
Node node2(2, &node1);//create a node with a x value of 2 and node1 being its parent
nodes.push_back(node1);
list.push_back(&nodes[0]);
//so far it works
//as soon as I add node2 to nodes, the pointer in "list" points to an object with
//strange data, with a x value of -17891602 and a parent 0xfeeefeee
nodes.push_back(node2);
list.push_back(&nodes[1]);
There is clearly undefined behaviour going on, but I can't manage to see where.
Could somebody please show me where my lack of understanding of pointers breaks this code and why?
So, the first issue that you have here is that you are using the address of individual Nodes of one of your vectors. But, over time, as you add more Node objects to your vector, those pointers may become invalid, because the vector may move the Nodes.
(The vector starts out at a certain pre-allocated size, and when you fill it up, it allocates a new, larger storage area and moves all of the elements to the new location. I'm betting that in your case, as soon as you add the second Node to nodes, it is doing this move.)
Is there a reason why you can't store the indices instead of the raw pointers?
One problem is that push_back can force a reallocation of the vector, i.e. it creates a larger block of memory, copies all existing elements to that larger block, and then deletes the old block. That invalidates any pointers you have to elements in the vector.
The problem is that, every time you add to a vector, it might need to expand its internal memory. If it does so, it allocates a new piece of storage, copies everything over, and deletes the old one, invalidating iterators and pointers to all of its objects.
As solution to your problem you could either
avoid reallocation by reserving enough space upfront (nodes.reserve(42))
turn nodes into a std::list (which doesn't invalidate iterators or pointers to elements not directly affected by changes)
store indexes instead of pointers.
Besides your problem, but still worth mentioning:
The legal use of identifiers starting with underlines is rather limited. Yours is legal, but if you don't know the exact rules, you might want to avoid using them.
Your comparison operator doesn't tell that it won't change its left argument. Also, operators treating their operands equally (i.e. not modifying them, as opposed to, say, +=), are usually best implemented as free functions, rather than as member functions.
just adding to the existing answers; instead of the raw pointers, consider using some form of smart pointer, for example, if boost is available, consider shared_ptr.
std::vector<boost::shared_ptr<Node> > nodes;
and
std::list<boost::shared_ptr<Node> > list;
Hence, you only need to create a single instance of Node, and it is "managed" for you. Inside the Node class, you have the option of a shared_ptr for parent (if you want to ensure that the parent Node does not get cleaned up till all child nodes are removed, or you can make that a weak_ptr.
Using shared pointers may also help alleviate problems where you want to store "handles" in multiple containers (i.e. you don't necessarily need to worry about ownership - as long as all references are removed, then the object will get cleaned up).
Your code looks fine to me, but remember that when nodes goes out of scope, list becomes invalid.
Related
I'm coding a Fibonacci heap data structure (https://en.wikipedia.org/wiki/Fibonacci_heap) in C++.
This data structure consists of several heaps, with roots connected in a doubly-linked list. Each node has a doubly-linked list of its children. A whole heap has a doubly-linked list of leaf nodes, to support fast pruning. (CLRS 19-3.b)
My implementation of Node is:
struct Node {
using Iterator = std::list<std::unique_ptr<Node>>::iterator;
using LeafIterator = std::list<std::reference_wrapper<std::unique_ptr<Node>>>::iterator;
Iterator parent;
std::list<std::unique_ptr<Node>> child_list;
T key;
bool mark = false;
bool is_leaf = false;
LeafIterator leaf_iterator;
Node(const T& key) : key {key} {}
};
My implementation of FibonacciHeap is:
using Iterator = std::list<std::unique_ptr<Node>>::iterator;
using LeafIterator = std::list<std::reference_wrapper<std::unique_ptr<Node>>>::iterator;
std::list<std::unique_ptr<Node>> NIL;
std::list<std::unique_ptr<Node>> root_list;
std::list<std::reference_wrapper<std::unique_ptr<Node>>> leaf_list;
Iterator min_element;
I used std::list<std::reference_wrapper<std::unique_ptr<Node>>> for leaf_list, instead of std::list<Node*>, because the memory of leaf nodes are solely owned by their parents, and I don't want double-delete crash.
The problem arises when I attempt to delete a leaf node. I can access a leaf node to delete by leaf_list.begin(), but I cannot erase it from its parent's child_list.
There are two possible workarounds I thought:
Perform a linear scan from parent's child_list to get a std::list<std::unique_ptr<Node>>::iterator that matches the given leaf. This is a linear scan, so slow.
Ditch leaf_list and maintain two pointers as member variables of Node that contains prev_leaf and next_leaf to emulate doubly linked list. I don't like this because it would make Nodes more bloaty.
...can't think else for now
What would be the best way to get std::list<std::unique_ptr<Node>>::iterator from std::reference_wrapper in this case?
Using a std::list<Node*> would not cause any double deletes, as long as you don't manually delete nodes, and let the actual unique_ptr pointers in child_list members handle that. You would just need to be careful to avoid using a dangling pointer after a Node has been destroyed. But this way still doesn't give a good way to quickly remove a Node* from the appropriate child_list.
Instead, you could maybe use std::list<Iterator> leaf_list;. This is relatively safe since inserts and erases on a std::list do not invalidate any iterators (except of course iterators to erased elements).
Though since you still have an invariant to follow, that the iterators in leaf_list belong to the appropriate child_list, it would be good to help code follow it. Depending on the intended usage and generality of the class, that might mean just putting notes in comments within or just before the struct Node definition. Or it might mean making Node a proper class with private members and a safer public interface - I might consider creating custom iterators using boost::iterator_adaptor to allow iteration over the leaf nodes without as much danger of breaking the invariant. If you don't expect much reuse, but then find it would be useful again in more contexts or projects, you could of course change these sorts of decisions later (unless too much code gets written using the raw way).
i've made a dynamic graph structure where both nodes and arcs are classes (i mean arcs are an actual instance in memory, they are not implied by an adjacency list of nodes to nodes).
Each node has a list of pointers to the arcs it's connected to.
Each arc has 2 pointers to the 2 nodes it's connecting.
Deleting a node calls delete for each of its arcs.
Each arc delete removes its pointer from the arcs lists in the 2 nodes it connects.
Simplified:
~node()
{
while(arcs_list.size())
{
delete arcs_list[arcs_list.size()-1];
}
}
~arc()
{
node_from.remove_arc(this);
node_to.remove_arc(this);
}
If i want to start using smart pointers here, how do i proceed?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
I was thinking about a shared_ptr, but shared pointer would only delete the arc when both nodes are deleted. If i delete only one node i would still have to explicitly delete all its arcs if i used shared_ptr. And that totally defeats the point of not using raw pointers in the first place.
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
Is there some other kind of smart pointer i should use to handle this?
Or is raw pointer just the plain simple way to go?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
You answered this question yourself:
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
When object A owns object B, then object A can exist after destroying B, but destroying A implies destroying B. Applied to your case, the two nodes share ownership of the arc.
Is there some other kind of smart pointer i should use to handle this? Or is raw pointer just the plain simple way to go?
Ah, yes. That is the real question. There is no pre-made smart pointer for this situation. However, I would not go with raw pointers in your node and/or arc classes. That would mean those classes would need to implement memory management on top of their primary purpose. (Much better to let each class do one thing well, then try to do many things and fail.) I see a few viable options.
1: Write your own smart pointer
Write a class that can encapsulate the necessary destruction logic. The node and/or arc classes would use your new class instead of standard smart pointers (and instead of raw pointers). Take some time to make sure your design decisions are solid. I'm guessing your new class would want a functional/callable of some sort to tell it how to remove itself from the lists it is in. Or maybe shift some data (like the pointers to the nodes) from the arc class to the new class.
I haven't worked out the details, but this would be a reasonable approach since the situation does not fit any of the standard smart pointers. The key point is to not put this logic directly in your node and arc classes.
2: Flag invalid arcs
If your program can stand not immediately releasing memory, you may be able to take a different approach to resolving an arc deletion. Instead of immediately removing an arc from its nodes' lists, simply flag the arc as no longer valid. When a node needs to access its arcs, it (or better yet, its list) would check each arc it accesses – if the arc is invalid, it can be removed from the list at that time. Once the node has been removed from both lists, the normal shared_ptr functionality will kick in to delete the arc object.
The usefulness of this approach decreases the less frequently a node iterates over its arcs. So there is a judgement call to be made.
How would an arc be flagged invalid? The naive approach would be to give it a boolean flag. Set the flag to false in the constructors, and to true when the arc should be considered deleted. Effective, but does require a new field. Can this be done without bloating the arc class? Well, presumably, each arc needs pointers to its nodes. Since the arc does not own its nodes, these are probably weak pointers. So one way to define an arc being invalid is to check if either weak pointer is expired(). (Note that the weak pointers could be manually reset() when the arc is being deleted directly, not via a node's deletion. So an expired weak pointer need not mean the associated node is gone, only that the arc no longer points to it.)
In the case where the arc class is sizeable, you might want to discard most of its memory immediately, leaving just a stub behind. You could add a level of indirection to accomplish this. Essentially, the nodes would share a pointer to a unique pointer, and the unique pointer would point to what you currently call your arc class. When the arc is deleted, the unique pointer is reset(), freeing most of the arc's memory. An arc is invalid when this unique pointer is null. (It looks like Davis Herring's answer is another way to get this effect with less memory overhead, if you can accept an object storing a shared_ptr to itself.)
3: Use Boost.Bimap
If you can use Boost, they have a container that looks like it would solve your problem: Boost.Bimap. But, you ask, didn't I already discount using an adjacency list? Yes, but this Bimap is more than just a way to associate nodes to each other. This container supports having additional information associated with each relation. That is, each relation in the Bimap would represent an arc and it would have an associated object with the arc's information. Seems to fit your situation well, and you would be able to let someone else worry about memory management (always a nice thing, provided you can trust that someone's abilities).
Since nodes can exist alone, they are owned by the graph (which might or might not be a single object), not the arcs (even as shared ownership). The ownership of an arc by its nodes is, as you observed, dual to the usual shared_ptr situation of either owner being sufficient to keep the object alive. You can nonetheless use shared_ptr and weak_ptr here (along with raw, non-owning pointers to the nodes):
struct Node;
struct Arc {
Node *a,*b;
private:
std::shared_ptr<Arc> skyhook{this};
public:
void free() {skyhook.reset();}
};
struct Node {
std::vector<std::weak_ptr<Arc>> arcs;
~Node() {
for(const auto &w : arcs)
if(const auto a=w.lock()) a->free();
}
};
Obviously other Node operations have to check for empty weak pointers and perhaps clean them out periodically.
Note that exception safety (including vs. bad_alloc in constructing the shared_ptr) requires more care in constructing an Arc.
I am attempting to implement a graph in C++ where each node is an instance of class A. My first instinct was to represent the collection of nodes in a Graph object by a vector of objects of type A.
However, I wanted to implement the following functionality as well: Suppose I overload the + operator so that when g_2 = g_0 + g_1 (where the g's are instances of the Graph class), g_2's nodes consist of the combined nodes of g_0 and g_1. If I modify any of the nodes in g_2, g_0 and g_1's nodes will remain unchanged and in a certain sense g_2 will no longer remain the sum of g_0 and g_1. However, if I instead represent the nodes in a graph as a vector of pointers to objects of type A, then modifying g_2 will modify g_0 and g_1, and vice versa, which would be desirable for the project I am working on.
That being said, I can't help but suspect that having a vector of pointers as a data member is dangerous, but I don't know enough to really say.
A vector of pointers is fine. You just need to take care of the memory management of your objects yourself, but this is definitely doable.
You will need to allocate your objects with new A() and delete them with delete pointer_to_a if you want to get rid of them.
Having vector of pointers is fine. I think is a common practice.
It won't be hard or dangerous if you use smart pointers like shared_ptr.
typedef std::shared_ptr<A> A_ptr;
std::vector<A_ptr> nodes;
It will take care of memory managment.
I have a graph implemented using a struct Node and a struct Edge where:
Each Edge has a start and an end Node
Each Node maintains a list of Edge objects which start from or end at it
The following is one possible implementation:
struct Node;
struct Edge {
Node *st;
Node *en;
int some_data;
};
const int MAX_EDGES = 100;
struct Node {
Edge *edges[MAX_EDGES];
int some_data;
};
While the above structs can represent the graph I have in mind, I would like to do it the "Modern C++" way while satisfying the following requirements:
Avoid pointers
Use an std::vector for Node::edges
Be able to store Node and Edge objects in standard C++ containers
How is this done in Modern C++? Can all of 1-3 be achieved?
Avoid pointers
You can use std::shared_ptr and std::weak_ptr for this. Just decide whether you want nodes to own edges, or vice versa. The non-owning type should use weak_ptr (to avoid cycles).
Unless your graph is acyclic you might still need to be careful about ownership cycles.
std::unique_ptr is not an option, because there is not a one-to-one relationship between nodes and edges, so there cannot be a unique owner of any given object.
Use an std::vector for Node::edges
No problem. Make it a std::vector<std::weak_ptr<Edge>> or std::vector<std::shared_ptr<Edge>> (depending whether edges own nodes or vice versa)
Be able to store Node and Edge objects in standard C++ containers
No problem, just ensure your type can be safely moved/copied without leaking or corrupting memory, i.e. has correct copy/move constructors and assignment operators. That will happen automatically if you use smart pointers and std::vector as suggested above.
Modern C++ eschews the assignment of dynamic memory to a raw pointer. This is because it is all to easy to forget to delete said pointer. Having said that there is nothing wrong with the use of raw pointers as reference to an object provided you can guarantee that the object's lifetime will be greater than the use of said pointer.
The rules generally are:
Use std::unique_ptr if an object has single owner.
Use raw pointers to reference objects created in 1. provided you can guarantee that the object's lifetime will be greater than the use of your reference.
Use std::shared_ptr for reference counted objects
Use std::weak_ptr to refer to a reference counted object when you do not want to increase the refernce count.
So in your case, if the Edge owns the Nodes then use std::unique_ptr, if not, the keep the raw pointers.
In your Node class, if the Node owns the Edges use a std::vector<Edge> otherwise use a std::vector<Edge*> although it might be more efficient to link the your Edges together in their own intrusive linked list.
Having done some work on complex graphs, it might be allocate all your Nodes and Edgees in a vector outside your graph and then only refer to them internally using raw pointers inside the graph. Remember memory allocation is slow so the less you do the faster your algorithm will be.
By using std::shared_ptr or std::unique_ptr
I don't think vector is a right choice here since a graph usually is not linear (usually speaking, also ,in most cases you can't linearize it like you can with a heap)
there is no standard 'general-use' container , but you can use templates here for generity
for example, your Element class can look like this:
template <class T>
struct Elem {
std::shared_ptr<Node> st , en;
T some_data;
};
speaking of modern C++ , I don't think struct is encouraged here , you ahould encapsulate you data
So this is a bit of a conceptual question. I'm writing a LinkedList in C++, and as Java is my first language, I start to write my removeAll function so that it just joins the head an the tail nodes (I'm using sentinel Nodes btw). But I instantly realize that this won't work in C++ because I have to free the memory for the Nodes!
Is there some way around iterating through the entire list, deleting every element manually?
You can make each node own the next one, i.e. be responsible for destroying it when it is destroyed itself. You can do this by using a smart pointer like std::unique_ptr:
struct node {
// blah blah
std::unique_ptr<node> next;
};
Then you can just destroy the first node and all the others will be accounted for: they will all be destroyed in a chain reaction of unique_ptr destructors.
If this is a doubly-linked list, you should not use unique_ptrs in both directions, however. That would make each node own the next one, and be owned by the next one! You should make this ownership relation exist only in one direction. In the other use regular non-owning pointers: node* previous;
However, this will not work as is for the sentinel node: it should not be destroyed. How to handle that depends on how the sentinel node is identified and other properties of the list.
If you can tell the sentinel node apart easily, like, for example, checking a boolean member, you can use a custom deleter that avoids deleting the sentinel:
struct delete_if_not_sentinel {
void operator()(node* ptr) const {
if(!ptr->is_sentinel) delete ptr;
}
};
typedef std::unique_ptr<node, delete_if_not_sentinel> node_handle;
struct node {
// blah blah
node_handle next;
};
This stops the chain reaction at the sentinel.
You could do it like Java if you used a c++ garbage collector. Not many do. In any case, it saves you at most a constant factor in running time, as you spend the cost to allocate each element in the list anyway.
Yes. Well, sort of... If you implement your list to use a memory pool then it is responsible for all data in that pool and the entire list can be deleted by deleting the memory pool (which may contain one or more large chunks of memory).
When you use memory pools, you generally have at least one of the following considerations:
limitations on how your objects are created and destroyed;
limitations on what kind of data you can store;
extra memory requirements on each node (to reference the pool);
a simple, intuitive pool versus a complex, confusing pool.
I am no expert on this. Generally when I've needed fast memory management it's been for memory that is populated once, with no need to maintain free-lists etc. Memory pools are much easier to design and implement when you have specific goals and design constraints. If you want some magic bullet that works for all situations, you're probably out of luck.