This is a more of design problem (I know why this is happening, just want to see how people deal with it). Suppose I have a simple linked list struct:
struct List {
int head;
std::shared_ptr<List> tail;
};
The shared_ptr enables sharing of sublists between multiple lists. However, when the list gets very long, a stack overflow might happen in its destructor (caused by recursive releases of shared_ptrs). I've tried using an explicit stack, but that gets very tricky since a tail can be owned by multiple lists. How can I design my List to avoid this problem?
UPDATE: To clarify, I'm not reinventing the wheel (std::forward_list). The List above is only a simplified version of the real data structure. The real data structure is a directed acyclic graph, which if you think about it is just a lot of of linked lists with shared tails/heads. It's usually prohibitively expensive to copy the graph, so data sharing is necessary.
UPDATE 2: I'm thinking about explicitly traversing down the pointer chain and std::move as I go. Something like:
~List()
{
auto p = std::move(tail);
while (p->tail != nullptr && p->tail.use_count() == 1) {
// Some other thread may start pointing to `p->tail`
// and increases its use count before the next line
p = std::move(p->tail);
}
}
This seems to work in a single thread, but I'm worried about thread safety.
If you're having problems with stack overflows on destruction for your linked datastructure, the easiest fix is just to implement deferred cleanup:
struct Graph {
std::shared_ptr<Graph> p1, p2, p3; // some pointers in your datastructure
static std::list<std::shared_ptr<Graph>> deferred_cleanup;
~Graph() {
deferred_cleanup.emplace_back(std::move(p1));
deferred_cleanup.emplace_back(std::move(p2));
deferred_cleanup.emplace_back(std::move(p3));
}
static void cleanup() {
while (!deferred_cleanup.empty()) {
std::list<std::shared_ptr<Graph>> tmp;
std::swap(tmp, deferred_cleanup);
tmp.clear(); } }
};
and you just need to remember to call Graph::cleanup(); periodically.
this should do it. With a little work it can easily be made thread-safe (a little locking/atomics in the deleter engine)
synopsis:
The shared_ptr's to the nodes are created with a custom destructor which, rather than deleting the node, hands it off to a deleter engine.
The engine's implementation is a singleton. Upon being notified of a new node to be deleted, it adds the node to a delete queue. If there is no node being deleted, the nodes in the queue are deleted in turn (no recursion).
While this is happening, new nodes arriving in the engine are simply added to the back of the queue. The in-progress delete cycle will take care of them soon enough.
#include <memory>
#include <deque>
#include <stdexcept>
#include <iostream>
struct node;
struct delete_engine
{
void queue_for_delete(std::unique_ptr<node> p);
struct impl;
static impl& get_impl();
};
struct node
{
node(int d) : data(d) {}
~node() {
std::cout << "deleting node " << data << std::endl;
}
static std::shared_ptr<node> create(int d) {
return { new node(d),
[](node* p) {
auto eng = delete_engine();
eng.queue_for_delete(std::unique_ptr<node>(p));
}};
}
int data;
std::shared_ptr<node> child;
};
struct delete_engine::impl
{
bool _deleting { false };
std::deque<std::unique_ptr<node>> _delete_list;
void queue_for_delete(std::unique_ptr<node> p)
{
_delete_list.push_front(std::move(p));
if (!_deleting)
{
_deleting = true;
while(!_delete_list.empty())
{
_delete_list.pop_back();
}
_deleting = false;
}
}
};
auto delete_engine::get_impl() -> impl&
{
static impl _{};
return _;
}
void delete_engine::queue_for_delete(std::unique_ptr<node> p)
{
get_impl().queue_for_delete(std::move(p));
}
struct tree
{
std::shared_ptr<node> root;
auto add_child(int data)
{
if (root) {
throw std::logic_error("already have a root");
}
auto n = node::create(data);
root = n;
return n;
}
};
int main()
{
tree t;
auto pc = t.add_child(6);
pc = pc->child = node::create(7);
}
std::shared_ptr (and before that, boost::shared_ptr) is and was the de-facto standard for building dynamic systems involving massive DAGs.
In reality, DAGs don't get that deep (maybe 10 or 12 algorithms deep in your average FX pricing server?) so the recursive deletes are not a problem.
If you're thinking of building an enormous DAG with a depth of 10,000 then it might start to be a problem, but to be honest I think it will be the least of your worries.
re the analogy of a DAG being like a linked list... not really. Since it's acyclic all your pointers pointing "up" will need to be shared_ptr and all your back-pointers (e.g. binding message subscriptions to sink algorithms) will need to be weak_ptr's which you lock as you fire the message.
disclaimer: I've spent a lot of time designing and building information systems based on directed acyclic graphs of parameterised algorithm components, with a great deal of sharing of common components (i.e. same algorithm with same parameters).
Performance of the graph is never an issue. The bottlenecks are:
initially building the graph when the program starts - there's a lot of noise at that point, but it only happens once.
getting data into and out of the process (usually a message bus). This is invariably the bottleneck as it involves I/O.
Related
I wrote a program to create a linked list, and I got undefined behavior (or I assume I did, given the program just stopped without any error) when I increased the size of the list to a certain degree and, critically, attempted to delete it (through ending its scope). A basic version of the code is below:
#include <iostream>
#include <memory>
template<typename T> struct Nodeptr;
template<class T>
struct Node {
Nodeptr<T> next;
T data;
Node(const T& data) : data(data) { }
};
template<class T>
struct Nodeptr : public std::shared_ptr<Node<T>> {
Nodeptr() : std::shared_ptr<Node<T>>(nullptr) { }
Nodeptr(const T& data) : std::shared_ptr<Node<T>>(new Node<T>(data)) { }
};
template<class T>
struct LinkedList {
Nodeptr<T> head;
void prepend(const T& data) {
auto new_head = Nodeptr<T>(data);
new_head->next = head;
head = new_head;
}
};
int main() {
int iterations = 10000;
{
LinkedList<float> ls;
std::cout << "START\n";
for(float k = 0.0f; k < iterations; k++) {
ls.prepend(k);
}
std::cout << "COMPLETE\n";
}
std::cout << "DONE\n";
return 0;
}
Right now, when the code is run, START and COMPLETE are printed, while DONE is not. The program exits prior without an error (for some reason).
When I decrease the variable to, say, 5000 instead of 10000, it works just fine and DONE is printed. When I delete the curly braces around the LinkedList declaration/testing block (taking it out its smaller scope, causing it NOT to be deleted before DONE is printed), then everything works fine and DONE is printed. Therefore, the error must be arising because of the deletion process, and specifically because of the volume of things being deleted. There is, however, no error message telling me that there is no more space left in the heap, and 10000 floats seems like awfully little to be filling up the heap anyhow. Any help would be appreciated!
Solved! It now works directly off of heap pointers, and I changed Node's destructor to prevent recursive calls to it:
~Node() {
if(!next) return;
Node* ptr = next;
Node* temp = nullptr;
while(ptr) {
temp = ptr->next;
ptr->next = nullptr;
delete ptr;
ptr = temp;
}
}
It is a stack overflow caused by recursive destructor calls.
This is a common issue with smart pointers one should be aware of when writing any deeply-nested data structure.
You need to an explicit destructor for Node removing elements iteratively by reseting the smart pointers starting from the tail of the list. Also follow the rule-of-3/5 and do the same for all other operations that might destroy nodes recursively as well.
Because this is essentially rewriting all object destruction it does however make use of smart pointers in the first place somewhat questionable, although there is still some benefit in preventing double delete (and leak) mistakes. Therefore it is common to simply not use smart pointers in such a situation at all and instead fall back to raw pointers and manual lifetime management for the data structure's nodes.
Also, there is no need to use std::shared_ptr here in any case. There is only one owner per node. It should be std::unique_ptr. std::shared_ptr has a very significant performance impact and also has the issue of potentially causing leaks if circular references are formed (whether intentionally or not).
I also don't see any point in having Nodeptr as a separate class. It seems to be used to just be an alias for std::make_shared<Node>, which can be achieved by just using a function instead. Especially inheriting from a standard library smart pointer seems dubious. If at all I would use composition instead.
As part of an exercise my university has tasked me with, I have written a small Graph implementation, following this header.
class Node {
private:
std::string name;
std::vector<Node*> children;
public:
Node(const std::string& name="");
virtual ~Node();
}
When writing code for the destructor ~Node(), I noticed that my implementation fails when the graph contains a cycle. This is my implementation so far, which obviously doesn't work if the graph contains a cycle.
Node::~Node() {
for (Node* n : children) {
delete n;
n = NULL;
}
children.clear();
}
I am uncertain as to how I would most elegantly write a destructor that can handle cycles in the graph?
Please note that I was specifically tasked to write a recursive destructor.
Thank you for your answers!
Option 1: Choose a representation for the graph where nodes are not owned by other nodes, but rather the graph which would be a distinct object. This way the node destructor doesn't need to do anything. This won't satisfy the requirement of recursion:
struct Graph {
std::vector<std::unique_ptr<Node>> nodes;
};
Note that if there is no inheritance involved, then you could simply use std::vector<Node>. I assume that there is, due to the usage of virtual desturctor in Node.
Alternatively, you could use another representation for the graph such as adjacency list.
Option 2: Use an algorithm to generate a minimum spanning forest of the graph. Then recursively delete the roots of each spanning tree. You can for example use the Kruskal's algorithm. (Given your representation, it looks like your graph may be connected, in which case there would be only one spaning tree).
One option could be to first create an unordered_set of all the Node*s and then to delete them.
void fill(std::unordered_set<Node*>& to_delete, Node* ptr) {
// try to insert ptr and return if it was already in the set
if(not to_delete.emplace(ptr).second) return;
// swap ptr->children with an empty vector
std::vector<Node*> tmp;
std::swap(tmp, ptr->children);
for(Node* c : tmp) // loop over the pointers
fill(to_delete, c); // fill recursively
}
virtual ~Node() noexcept { // std::terminate if anything should throw
if(children.empty()) return; // nothing to do here
std::unordered_set<Node*> to_delete; // to collect all the Node*'s
fill(to_delete, this); // fill the set recursively
to_delete.erase(this); // don't delete "this"
for(auto c : to_delete) // delete all - they have no children by now
delete c;
}
Demo
If your graph is a tree (I assume it since your implementation of destructor is valid only for a tree) and you can store parent of the Node then you can write iterative version which do not require any extra data structure to avoid recursion.
Also learn to use smart pointers.
class Node {
private:
std::string name;
std::vector<std::unique_ptr<Node>> children;
Node* parent;
void safeCleanClildren();
public:
Node(std::string name="", Node* parent = nullptr)
: name{std::move(name)}
{}
~Node() {
iterativeCleanClildren();
}
void addChild(std::string name) {
children.emplace_back(std::make_unique<Node>(std::move(node), this);
}
};
void Node::iterativeCleanClildren()
{
auto p = this;
while (!p->children.empty()) {
while (!p->children.empty()) {
p = p->back().get(); // go as deep as possible
}
if (p != this) {
p = p->parent; // go back to parent
p->children.pop_back();
}
}
}
How this work?
first it finds leaf (right most) in a tree (node which do not have children)
Then goes back to parent node and remove child which was just found p->children.pop_back(); (this destroys unique_ptr of just found leaf).
Then finds again leaf and so on.
This tree clearing continues until root (this) node is reached
This way root node ends with no children at all and since it is iterative implementation overflown is impossible. It doesn't matter how much unbalance this tree is.
I am working on a medium sized C++ framework making use of the visitor pattern.
A valgrind test of a program implementing this framework reported a number of memory leaks that could be tracked down to one of the visitors, namely the copyCreator.
template<typename copyNodeType>
struct copyCreator {
copyCreator {}
copyCreator(node * firstVisit) {
firstVisit->accept(*this);
}
~copyCreator() {
copy.reset();
for(auto ptr : openList) {
delete ptr;
}
}
std::unique_ptr<copyNodeType> copy = 0;
vector<nonterminalNode *> openList;
// push to tree
template<typename nodeType>
void push(nodeType * ptr) {
if (copy) {
// if root is set, append to tree
openList.back()->add_child(ptr);
}
else {
auto temp = dynamic_cast<copyNodeType *>(ptr);
if(temp) {
copy = std::unique_ptr<copyNodeType>(temp);
}
}
}
// ...
void visit(struct someNonterminalNode & nod) {
auto next = new someNonterminalNode(); //This is leaked
push(next);
openList.push_back(next);
nod.child->accept(*this);
openList.pop_back();
};
There are a two main reasons why I am confused about this:
The two different constructors cause a different number of leaks
The leaks are reported to occur during visits
The accept methods of all nodes simply triggers a standard double dispatch to the visit method of the correct visitor.
I am fairly new to C++ programming and might have overlooked some really fundamental issue.
copyCreator<nodeType>::push(ptr) is supposed to take ownership of ptr. But it fails to do so if (a) ptr is not of type nodeType* (as determined by dynamic_cast), and (b) no node of type nodeType has been visited yet.
In other words, copyCreator<nodeType> creates, and promptly leaks, copies of all nodes until it encounters one of type nodeType.
This is precisely what happens in copyCreator<programNode> cpy2(&globalScope, a);, where a is forallNode*. cpy2 expects to encounter programNode (which it never does), and meanwhile, it copies and leaks all other nodes.
I have two classes: "node" and "poly". The node objects are linked together to form a linked list. The poly object holds a pointer to the first node. I am trying to deallocate the memory for the entire "poly". I want to delete the poly - then within the poly destructor call a function (something like "freePoly") that will help me iterate through the entire linked list of node objects - deleting all nodes.
Here is the class definition:
class Node
{
private:
double coeff;
int exponent;
Node *next;
public:
Node(double c, int e, Node *nodeobjectPtr)
{
coeff = c;
exponent = e;
next = nodeobjectPtr;
}
~Node()
{
printf("Node Destroyed");
//???
}
class poly
{
private:
Node *start;
public:
poly(Node *head) /*constructor function*/
{
start = head;
}
~poly() /*destructor*/
{
//???
}
void freePoly();
};
void poly::freePoly()
{
//???
}
I've tried a lot of things, but essentially I get stuck where I'm only deleting the first node object. Then I've lost the pointer to the other nodes... and leak memory because I can't access them anymore for deletion.
You can avoid a lot of problems and work by using a std::vector instead of a Do-It-Yourself linked list. Unless this is for learning.
That said, do
~poly() /*destructor*/
{
while( start != 0 )
{
node* p_doomed = start;
start = start->next;
delete p_doomed;
}
}
There are also many other ways to do this, but the above is a pattern that can help you figure out how to do similar things.
If you want to keep your code as is, then your freePoly should look like this:
while(start)
{
Node *ptr = start;
start = start->getNext();
delete ptr;
}
Notice what this code does: first it makes a copy of the pointer to the current head - i.e. the first Node, then it makes head point to the next object and only then calls delete for the old head pointer.
Of course, this design isn't ideal: you are programming using a C++ compiler and you have some classes and some member functions but you aren't really using C++ effectively:
Not only does language provides you with wonderful tools such as std::list or std::vector so that you don't have to reinvent the wheel and things like std::unique_ptr so that pointers know when it's safe to delete themselves.
It also provides you with powerful abstract concepts to model the bahavior of objects.
I suggest that you take a step back and rethink your code. Key questions to ask are: how can I write less code here? What facilities of C++ can I leverage here? What are these objects I have and what does each object do?
I understand that this may be a homework exercise and you have to implement things a certain way, but don't let that stop you from learning.
You could try this:
private:
void auxDestroy(Node* p);
void Node::auxDestroy(Node* p){
if (p!=0) {
if (p->next != 0) {
auxDestroy(p->next);
}
else {
delete p;
}
}
}
Then in the destroy method you could call this auxDestroy(this->next);
~Node(){
auxDestroy(this->next);
}
And in poly destructor:
~poly(){
delete this->start;
}
I am trying to make a linked list based queue for fast operations, now that's what i have:
template< typename T,
template< typename > class Allocator = default_allocator
>
class fast_queue
{
public:
typedef T element_type;
typedef T* element_ptr;
private:
typedef struct e_node
{
element_type val;
e_node * next;
}node_type;
typedef node_type * node_ptr;
node_ptr parent;
node_ptr child;
int m_size;
typedef Allocator< node_type > allocator_type;
public:
fast_queue()
: parent( 0 )
, child( 0 )
, m_size( 0 )
{
}
~fast_queue() {
this->clear();
}
void push(element_type& i)
{
node_ptr n = allocator_type::alloc();
n->val = i;
n->next = (node_ptr)0;
if(child) {
child->next = n;
} else {
parent = n;
}
child = n;
m_size++;
}
element_type pop()
{
element_type ret = parent->val;
node_ptr p = parent;
parent = parent->next;
m_size--;
allocator_type::dealloc(p);
return ret;
}
inline int size() const
{
return m_size;
}
inline bool empty() const
{
return (m_size == 0);
}
void clear()
{
while(!empty()) {
pop();
}
child = 0;
}
};
Pretty straightforward, now what i'm having a problem with is the clear() function.
It seems to be taking way too much time deallocating all the nodes in the queue(7 seconds).
So the question is, what might be a better algorithm? I tried to understand MSVC's implementation of std::deque but the code is so bulky for me to understand.
EDIT:
The queue should be generic, allowing arbitrary types of data to be queued.
Here is my testing code (Windows)
DWORD time1 = timeGetTime();
fast_queue<int> queue;
for(int i = 0; i < 100000; i++) {
queue.push(i);
}
queue.clear();
cout << "Test time: " << (int)(timeGetTime() - time1) << " milliseconds" << endl;
You're constructing a linked list. The deque implementation stores many, many elements in each allocation. You, however, allocate and deallocate individually for each element. This is why your queue is so slow.
In addition to this, the Standard's queue interface says that you should take a complete Allocator type, not a template, although the reality of fulfilling the Standard's allocator requirements are that it must be a template anyway.
There's not much you can do by changing the push/pop/clear algorithms, because 95% of the time goes to allocation and deallocation of the nodes. But there are some things you could do:
1) Use some kind of memory pool for the nodes. You could either use a pool allocator (boost::pool_alloc is a good one if you don't want to implement your own), or you could use an internal node cache in the queue class. So instead of deleting nodes you just push it to the node cache, and when creating nodes you pop them from the cache.
2) Store multiple items in one node. For example if you have 8 items in one node you only have to alloc/dealloc once every 8 pushes/pops. Of course this requires slightly more complicated code; in addition to having pointers to the head and tail nodes, you would also need index variables for both of them to keep track of how many items are actually in use.
I had to take out all the Allocator stuff to get it to compile (under g++ 4.40), but it runs in no time at all. Even if I push 100 times more elements, it only takes about half a second to populate the queue and half a second to clear it. Have you tried using new and delete?
As suggested by other people, memory allocation/deallocation is the largest performance problem here.
I suggest that you try boost::circular_buffer. If the default size is set high enough, it will only cause one memory allocation over its lifetime.