Delete on a very deep tree - c++

I am building a suffix trie (unfortunately, no time to properly implement a suffix tree) for a 10 character set. The strings I wish to parse are going to be rather long (up to 1M characters). The tree is constructed without any problems, however, I run into some when I try to free the memory after being done with it.
In particularly, if I set up my constructor and destructor to be as such (where CNode.child is a pointer to an array of 10 pointers to other CNodes, and count is a simple unsigned int):
CNode::CNode(){
count = 0;
child = new CNode* [10];
memset(child, 0, sizeof(CNode*) * 10);
}
CNode::~CNode(){
for (int i=0; i<10; i++)
delete child[i];
}
I get a stack overflow when trying to delete the root node. I might be wrong, but I am fairly certain that this is due to too many destructor calls (each destructor calls up to 10 other destructors). I know this is suboptimal both space, and time-wise, however, this is supposed to be a quick-and-dirty solution to a the repeated substring problem.
tl;dr: how would one go about freeing the memory occupied by a very deep tree?
Thank you for your time.

One option is to allocate from a large buffer then deallocate that buffer all at once.
For example (untested):
class CNodeBuffer {
private:
std::vector<CNode *> nodes;
public:
~CNodeBuffer() {
empty();
}
CNode *get(...) {
CNode *node = new CNode(...);
nodes.push_back(node);
return node;
}
void empty() {
for(std::vector<CNode *>::iterator *i = nodes.begin(); i != nodes.end(); ++i) {
delete *i;
}
nodes = std::vector<CNode *>();
}
};
If pointers to a std::vector's elements are stable, you can make things a bit simplier and just use a std::vector<CNode>. This requires testing.

Do you initialize the memory for the nodes themselves? From what I can see, your code only allocates memory for the pointers, not the actual nodes.
As far as your question goes, try to iterate over the tree in an iterative manner, not recursively. Recursion is bad, it's nice only when it's on the paper, not in the code, unfortunately.

Have you considered just increasing your stack size?
In visual studio you do it with /FNUMBER where NUMBER is stack size in bytes. You might also need to specify /STACK:reserve[,commit].

You're going to do quite a few deletes. That will take a lot of time, because you will access memory in a very haphazard way. However, at that point you don't need the tree structure anymore. Hence, I would make two passes. In the first pass, create a std::vector<CNode*>, and reserve() enough space for all nodes in your tree. Now recurse over the tree and copy all CNode*'s to your vector. In the second step, sort them (!). Then, in the third step, delete all of them. The second step is technically optional but likely makes the third step a lot faster. If not, try sorting in reverse order.

I think in this case a breadth-first cleanup might help, by putting all the back-tracking information into a deque rather than on the OS provided stack. It still won't pleasantly solve the problem of making it happen in the destructor though.
Pseudocode:
void CNode::cleanup()
{
std::deque<CNode*> nodes;
nodes.push_back(this);
while(!nodes.empty())
{
// Get and remove front node from deque.
// From that node, put all non-null children at end of deque.
// Delete front node.
}
}

Related

C++ avoid dynamic memory allocation

Imagine I have some Node struct that contains pointers to the left and right children and some data:
struct Node {
int data;
Node *left;
Node *right;
};
Now I want to do some state space search, and naturally I want to construct the graph as I go. So I will have a kind of loop that will have to create Nodes and keep them around. Something like:
Node *curNode = ... ; // starting node
while (!done) {
// ...
curNode->left = new Node();
curNode->right = new Node();
// ..
// Go left (for example)
curNode = curNode->left;
}
The problem is that I have to dynamically allocate node on each iteration, which is slow. So the question is: how can I have pointers to some memory but not by allocating it one by one?
The first solution I thought of is to have a std::vector<Node> that will contain all the allocated nodes. The problem is that when we push_back elements, all references might be invalidated, so all my left/right pointers will be garbage.
The second solution is to allocate a big chunk of memory upfront, and then we just grab the next available pointer when we want to create a Node. To avoid references invalidation, we just have to create a linked list of big chunks of memory when we exceed the capacity of the current chunk so every given pointer stays valid. I think that std::deque behaves like this, but it's not explicitly created for this.
Another solution would be to store vector indices instead of pointers but this is not a solution because a Node doesn't want to be associated with any container, it wants the pointer directly.
So what is the good solution here, that would avoid having to allocated new nodes on each iteration?
You can use std::deque<Node> and it will do memory management for you creating elements by groups and no invalidating pointers if you do not delete elements in middle. Though if you want to have more precise control on how many elements in a group you can quite simply create something like that:
class NodePool {
constexpr size_t blockSize = 512;
using Block = std::array<Node,blockSize>;
using Pool = std::list<Block>;
size_t allocated = blockSize;
Pool pool;
public:
Node *allocate()
{
if( allocated == blockSize ) {
pool.emplace_back();
allocated = 0;
}
return &( pool.back()[ allocated++ ] );
}
};
I did not try to compile it, but it should be enough to exress the idea. Here changing blockSize you can fine tune performance of your program. Though you should be aware than Node objects will be fully constructed by groups (unlike hoiw std::deque would do it). As much as I am aware there is no way to create raw memory for Node objects which is standard comformant.

Use of pointer to vector which involved the use of 'new'

I would like to create a vector of pointers to struct
vector<myStruct*> vec
For elements in the vector, not all of them contain data. Some of them may point to NULL.
So, should I create space by new in each of the element first
for(int i = 0; vec.size() ;i++){
if (thisSpaceIsValid(i))
vec.at(i) = new myStruct;
else
vect.at(i) = NULL;
}
The problem comes:
-If I use new for each element, it would be very slow. How can I speed it up a bit? Is there a way the create all the spaces that I need , that automatically access the pointer of such space to the vector(vec here)?
-If later I use delete to free the memory, would the problem of speed still bother me?
If I use "new" for each element, it would be very slow. How can I speed it up a bit? Is there a way the create all the spaces that I need , that automatically access the pointer of such space to the vector("vec" here)?
You can do that.
Let's say the size of your vector is M and you only need N of those elements to have pointers to objects and other elements are null pointers. You can use:
myStruct* objects = new myStruct[N];
and then, use:
for(int i = 0, j = 0; vec.size(); i++)
{
if (thisSpaceIsValid(i))
{
if ( j == N )
{
// Error. Do something.
}
else
{
vec[i] = objects+j;
++j;
}
}
else
{
vect[i] = NULL;
}
}
You have to now make sure that you are able to keep track of the value of objeccts so you can safely deallocate the memory by using
delete [] objects;
PS
There might be a better and more elegant solution to your problem. It will be worth your while to spend a bit more time thinking over that.
EDIT:
After reading the question again, it seems I misunderstood the question. So here is an edited answer.
If you only need to execute the code during some kind of initialization phase, you can create all the instances of myStruct in an array and then just point to those from the vector as already proposed by R Sahu. Note that the solution requires you to create and delete all instances at the same time.
However, if you execute this code several times and/or don't know exactly how many myStruct instances you will need, you could overwrite new and delete for the struct and handle memory allocation yourself.
See Callling object constructor/destructor with a custom allocator for an example of this. See the answer by Jerry Coffin.
BTW - you don't need vec.at(i) as you are iterating from 0 to size. vec[i] is okay and should perform a better.
OLD ANSWER:
You can do
vector<myStruct*> vec(10000, nullptr);
to generate a vector with for instance 10000 elements all initialized to nullptr
After that you can fill the relevant elements with pointer to the struct.
For delete just
for (auto e : vec) delete e;
cause it is safe to do deleteon a nullptr
If you need a vector of pointers, and would like to avoid calling new, then firstly create a container of structs themselves, then assign pointers to the elements into your vec. Be careful with choosing the container of structs. If you use vector of structs, make sure to reserve all elements in advance, otherwise its elements may move to a different memory location when vector grows. Deque on the other hand guarantees its elements don't move.
Multiple small new and delete calls should be avoided if possible in c++ when performance matters a lot.
The more I think about it, the less I like #RSahu's solution. In particular, I feel memory management in this scenario would be a nightmare. Instead I suggest using a vector of unique_ptr's owning memory allocated via custom alloctor. I believe, sequential allocator would do.

Populate an unordered queue of pointers to vector elements

I have a data structure that works like an unordered queue and a vector filled with objects of class A. I want to populate the queue one element at a time (using a push() function) with pointers to each of the objects in the vector.
This implementation needs to:
Keep track of the original order of the objects in the vector even as the pointers stored in the queue swap positions in accordance with a comparator and the values of the objects
Allow for the continued addition of objects to the vector (again, mindful of order)
Allow the objects to be edited according to their original order in the vector without needing to recopy everything to the queue (hence, queue of pointers rather than objects)
I've been beating my head against the wall for several hours now in an attempt to figure this out. Right now I have two solutions that both fail for different reasons.
The first is
for(auto i = vector.begin(); i < vector.end(); i++)
{
queue->push(new A (*i));
}
This one worked perfectly until it came time to edit the elements in vector, at which point I realized that it seemed to have no effect whatsoever on the values in the queue. Maybe the pointers got decoupled somewhere.
The second is
for(A* ptr = vector.data(); ptr <= (vector.data()+vector.size()-1); ptr++)
{
A** bar = new A*;
*bar = ptr;
queue->push(*bar);
}
As best I can tell, this one successfully matches up the pointers with objects in vector, but for some other reason I can't tell causes a core abortion after doing some additional operations on the queue (pop(), max(), etc).
If anyone out there can offer any advice, I would sincerely appreciate it.
Oh, and before I forget, as much as I would love to use shared_pointers or unique_pointers or boost, I'm limiting this to just the STL and vector, list and deque. No other containers.
Your first and third requirements can be met with pointers, and the implementation is not difficult. What I advise you to do is to not use auto since it will give you an iterator object and converting that to a pointer can be hard.
Regarding your second requirement, it cannot be done since adding things to the vector can trigger a reallocation of memory in order to increase the vector capacity, unless you know the max number of objects the vector should hold beforehand. For fulfilling all your requirements then, the best solution is to "link" the objects by using the vector index instead of pointers. This is also way simpler.
But then again, if you remove things from the vector, then you have to update the entire queue. The most flexible solution that will allow you to do pretty much everything is to use lists instead of vectors. But it can have performance impact and you have to ponder before making the choice.
To make it work with vector and pointers, here is what I would do:
class A { /* your class here */ };
vector<A> vec;
/* Avoid vector reallocating memory. */
vec.reserve(MAX_NUMBER_OF_OBJECTS);
/* Then, populate the vector. */
/* No need for fully populating it though. */
/* ... */
/* Populate the queue. */
queue<A*> q;
for(int i = 0; i < vec.size(); i++){
q.push(&vec[i]);
}

How to implement a compact linked list with array?

Here is the question of exercise CLRS 10.3-4 I am trying to solve
It is often desirable to keep all elements of a doubly linked list compact in storage,
using, for example, the first m index locations in the multiple-array representation.
(This is the case in a paged, virtual-memory computing environment.) Explain how to implement the procedures ALLOCATE OBJECT and FREE OBJECT so that the representation is compact. Assume that there are no pointers to elements of the linked list outside the list itself. (Hint: Use the array implementation of a stack.)
Here is my soln so far
int free;
int allocate()
{
if(free == n+1)
return 0;
int tmp = free;
free = next[free];
return tmp;
}
int deallocate(int pos)
{
for(;pos[next]!=free;pos[next])
{
next[pos] = next[next[pos]];
prev[pos] = prev[next[pos]];
key[pos] = key[next[pos]];
}
int tmp = free;
free = pos;
next[free] = tmp;
}
Now , The problem is , If this is the case , We don't need linked list. If deletion is O(n) we can implement it using normal array. Secondly I have not used the array implementation of stack too . So where is the catch? How should I start?
You don't have to shrink the list right away. Simply leave a hole and link that hole to your free list. Once you've allocated the memory, it's yours. So let's say your page size is 1K. Your initial allocated list size would then be 1K, even if the list is empty. Now you can add and remove items very effectively.
Then introduce another method to pack your list, i.e. remove all holes. Keep in mind that after calling the pack-method, all 'references' become invalid.

C++ fixed-size linked list

Non-duplicates:
Which STL C++ container to use for a fixed size list? (Specific use case)
std::list fixed size (See below)
Motives:
Allocation happens once (in the constructor) and deallocation happens once (in the destructor).
O(1) insertion and removal of an element anywhere in the list without needing to deal with the overhead of memory management. This isn't possible with an array-based implementation.
Is there a straightforward approach for implementing this using the standard library? Is there an implementation of something like this in Boost?
What I was first thinking when I read that was the approach to use a different allocator, i.e. one that pre-allocates a given number of elements to avoid the price of allocating. I'm not familiar with defining allocators though, but if you find out I'd be interested in the results.
Without that, here's a different approach. I saved myself the template ... stuff, but I guess you'll be able to do that yourself if you need.
typedef std::list<...> list_t;
struct fslist: private list_t
{
// reuse some parts from the baseclass
using list_t::iterator;
using list_t::const_iterator;
using list_t::begin;
using list_t::end;
using list_t::empty;
using list_t::size;
void reserve(size_t n)
{
size_t s = size();
// TODO: Do what std::vector does when reserving less than the size.
if(n < s)
return;
m_free_list.resize(n - s);
}
void push_back(element_type const& e)
{
reserve_one();
m_free_list.front() = e;
splice(end(), m_free_list, m_free_list.begin());
}
void erase(iterator it)
{
m_free_list.splice(m_free_list.begin(), *this, it);
}
private:
// make sure we have space for another element
void reserve_one()
{
if(m_free_list.empty())
throw std::bad_alloc();
}
list_t m_free_list;
};
This is incomplete, but it should get you started. Also note that splice() is not made public, because moving elements from or to a different list would change both size and capacity.
I think the simplest way to do it would be to have 2 data structures. An array/vector which is fixed sized and is used for "allocation". You simply grab an element from the array to create a node and insert it into your list. Something like this seems to meet you requirements:
struct node {
node *prev;
node *next;
int value;
};
node storage[N];
node *storage_ptr = storage;
then to create a new node:
if(node == &[storage + N]) {
/* out of space */
}
node *new_node = *storage_ptr++;
// insert new_node into linked list
This is fixed size, allocated all at once, and when storage goes out of scope, the nodes will be destroyed with it.
As for efficiently removing items from the list, it is doable, but slightly more complex. I would have a secondary linked list for "removed" nodes. When you remove a node from the main list, insert it at the end/beginning of the "deleted" list.
When allocating, check the deleted list first before going to the storage array. If it's NULL use storage, otherwise, pluck it off the list.
I ended up writing a template for this called rigid_list.
It's far from complete but it's a start:
https://github.com/eltomito/rigid_list
(motivated by Ulrich Eckhardt's answer)