c++ freeing memory in priority_queue

c++ freeing memory in priority_queue - c++

I have a
priority_queue<node*, std::vector<node*>, CompareNodes> heap;
Let's say the node consists of:
class node {
public:
int value;
int key;
int order = 1000000;
};
How do I free the memory after i'm done with the priority queue?
My approach doesn't seem to be working:
while (heap.top()) {
node * t = heap.top();
heap.pop();
delete t;
}

Looks like you'll want to do something more like this:
while (!heap.empty())
{ /* the rest ... */ }
If the heap is empty, .top() will throw an exception because there's nothing to return, which will happen when you are popping elements.
Also, if available you should use
priority_queue<std::unique_ptr<node>, std::vector<std::unique_ptr<node>>, CompareNodes> heap;
so you don't have to worry about clearing the memory yourself.

Just like most std:: containers, the memory may or may not be freed when you want it to be. Memory is usually kept around for a longer time so that when you perform a heap.push or equivalent operation, the memory doesn't need to be allocated again.
Think of std::vector which has to allocate a new set of memory for the entire vector each time it grows (vector data must be contiguous in memory). It is more efficient for std::vector to perform a large one time allocation and keep the memory around so that the growth operation doesn't kill performance -- a) allocate new space big enough, b) copy entire contents of existing vector to new vector space, c) delete the old vector space.
Bottom line is you can't force it to free memory for individual items.

Related

Allocating Sufficient Memory for a Known Number of Structs

First time implementing a graph where the total number of nodes is known when the constructor is called and performance is the highest priority.
Never allocated memory before, so the process is a little hazy.
The number of nodes required is (n*(n+1))/2 where n is the length of the string passed to the constructor.
#include <string>
struct ColorNode {
ColorNode* lParent;
ColorNode* rParent;
char color;
};
class ParentGraph {
std::string base;
int len, nodes;
ParentGraph(std::string b): base(b) {
len = base.length();
nodes = (len * (len + 1)) / 2;
// how to allocate enough memory for number of copies of "ColorNode" equal to "nodes"?
}
};
What is the best practice for allocating memory in this instance?
Will allocating the memory beforehand make a significant difference in performance?
It may turn out that an array or vector is a better choice, but really need the practice in both data structures and memory allocation.
Thanks for the consideration.

use
std::vector<ColorNode> nodes;
life will be very simple after that.
You can be helpful to std::vector if you know the size you want
auto nodes = std::vector<ColorNode>(size);
This will allocate a contiguous array on the heap for you, manage its growth, allocation, deallocation etc.
You will basically get the same in memory structure if you do new ColorNode[size] (or even malloc(....) if some evil person tried to persuade you that raw malloc will be faster). But you have to do all the nasty management yourself.
You only need to diverge from this advice if you have too many objects to fit into one contiguous memory block. If thats the case say so

C++ avoid dynamic memory allocation

Imagine I have some Node struct that contains pointers to the left and right children and some data:
struct Node {
int data;
Node *left;
Node *right;
};
Now I want to do some state space search, and naturally I want to construct the graph as I go. So I will have a kind of loop that will have to create Nodes and keep them around. Something like:
Node *curNode = ... ; // starting node
while (!done) {
// ...
curNode->left = new Node();
curNode->right = new Node();
// ..
// Go left (for example)
curNode = curNode->left;
}
The problem is that I have to dynamically allocate node on each iteration, which is slow. So the question is: how can I have pointers to some memory but not by allocating it one by one?
The first solution I thought of is to have a std::vector<Node> that will contain all the allocated nodes. The problem is that when we push_back elements, all references might be invalidated, so all my left/right pointers will be garbage.
The second solution is to allocate a big chunk of memory upfront, and then we just grab the next available pointer when we want to create a Node. To avoid references invalidation, we just have to create a linked list of big chunks of memory when we exceed the capacity of the current chunk so every given pointer stays valid. I think that std::deque behaves like this, but it's not explicitly created for this.
Another solution would be to store vector indices instead of pointers but this is not a solution because a Node doesn't want to be associated with any container, it wants the pointer directly.
So what is the good solution here, that would avoid having to allocated new nodes on each iteration?

You can use std::deque<Node> and it will do memory management for you creating elements by groups and no invalidating pointers if you do not delete elements in middle. Though if you want to have more precise control on how many elements in a group you can quite simply create something like that:
class NodePool {
constexpr size_t blockSize = 512;
using Block = std::array<Node,blockSize>;
using Pool = std::list<Block>;
size_t allocated = blockSize;
Pool pool;
public:
Node *allocate()
{
if( allocated == blockSize ) {
pool.emplace_back();
allocated = 0;
}
return &( pool.back()[ allocated++ ] );
}
};
I did not try to compile it, but it should be enough to exress the idea. Here changing blockSize you can fine tune performance of your program. Though you should be aware than Node objects will be fully constructed by groups (unlike hoiw std::deque would do it). As much as I am aware there is no way to create raw memory for Node objects which is standard comformant.

Fast way to push_back a vector many times

I have identified a bottleneck in my c++ code, and my goal is to speed it up. I am moving items from one vector to another vector if a condition is true.
In python, the pythonic way of doing this would be to use a list comprehension:
my_vector = [x for x in data_vector if x > 1]
I have hacked a way to do this in C++, and it is working fine. However, I am calling this millions of times in a while-loop and it is slow. I do not understand much about memory allocation, but I assume that my problem has to do with allocating memory over-and-over again using push_back. Is there a way to allocate my memory differently to speed up this code? (I do not know how large my_vector should be until the for-loop has completed).
std::vector<float> data_vector;
// Put a bunch of floats into data_vector
std::vector<float> my_vector;
while (some_condition_is_true) {
my_vector.clear();
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Use my_vector to render graphics on the GPU, but do not change the elements of my_vector
// Change the elements of data_vector, but not the size of data_vector
}

Use std::copy_if, and reserve data_vector.size() for my_vector initially (as this is the maximum possible number of elements for which your predicate could evaluate to true):
std::vector<int> my_vec;
my_vec.reserve(data_vec.size());
std::copy_if(data_vec.begin(), data_vec.end(), std::back_inserter(my_vec),
[](const auto& el) { return el > 1; });
Note that you could avoid the reserve call here if you expect that the number of times that your predicate evaluates to true will be much less than the size of the data_vector.

Though there are various great solutions posted by others for your query, it seems there is still no much explanation for the memory allocation, which you do not much understand, so I would like to share my knowledge about this topic with you. Hope this helps.
Firstly, in C++, there are several types of memory: stack, heap, data segment.
Stack is for local variables. There are some important features associated with it, for example, they will be automatically deallocated, operation on it is very fast, its size is OS-dependent and small such that storing some KB of data in the stack may cause an overflow of memory, et cetera.
Heap's memory can be accessed globally. As for its important features, we have, its size can be dynamically extended if needed and its size is larger(much larger than stack), operation on it is slower than stack, manual deallocation of memory is needed (in nowadays's OS, the memory will be automatically freed in the end of program), et cetera.
Data segment is for global and static variables. In fact, this piece of memory can be divided into even smaller parts, e.g. BBS.
In your case, vector is used. In fact, the elements of vector are stored into its internal dynamic array, that is an internal array with a dynamic array size. In the early C++, a dynamic array can be created on the stack memory, however, it is no longer that case. To create a dynamic array, ones have to create it on heap. Therefore, the elements of vector are stored in an internal dynamic array on heap. In fact, to dynamically increase the size of an array, a process namely memory reallocation is needed. However, if a vector user keeps enlarging his or her vector, then the overhead cost of reallocation cost will be high. To deal with it, a vector would firstly allocate a piece of memory that is larger than the current need, that is allocating memory for potential future use. Therefore, in your code, it is not that case that memory reallocation is performed every time push_back() is called. However, if the vector to be copied is quite large, the memory reserved for future use will be not enough. Then, memory allocation will occur. To tackle it, vector.reserve() may be used.
I am a newbie. Hopefully, I have not made any mistake in my sharing.
Hope this helps.

Run the code twice, first time only counting, how many new elements you will need. Then use reserve to already allocate all the memory you need.
while (some_condition_is_true) {
my_vector.clear();
int newLength = 0;
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
newLength++;
my_vector.reserve(newLength);
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Do stuff with my_vector and change data_vector
}

I doubt allocating my_vector is the problem, especially if the while loop is executed many times as the capacity of my_vector should quickly become sufficient.
But to be sure you can just reserve capacity in my_vector corresponding to the size of data_vector:
my_vector.reserve(data_vector.size());
while (some_condition_is_true) {
my_vector.clear();
for (auto value : data_vector) {
if (value > 1)
my_vector.push_back(value);
}
}

If you are on Linux you can reserve memory for my_vector to prevent std::vector reallocations which is bottleneck in your case. Note that reserve will not waste memory due to overcommit, so any rough upper estimate for reserve value will fit your needs. In your case the size of data_vector will be enough. This line of code before while loop should fix the bottleneck:
my_vector.reserve(data_vector.size());

Dynamically switching between stack and heap

Suppose I'm writing a simple buffer class. This buffer would act as a simple wrapper for a standard C array of objects. It should also be backwards-compatible to work with existing functions that take simple arrays as input.
The goal here is to make this buffer efficient in both speed and memory usage. Since stack allocation is always faster than heap, I want to allocate everything on a stack to a certain threshold, and if it grows larger, re-allocate on the heap. How can this be done efficiently?
I researched, and apparently std::string does something similar. I'm just not sure how. The closest solution I've had has been something along the lines of (pseudo-code, not compiled):
template <typename T, int MinSize>
class Buffer
{
public:
void Push(const T& t)
{
++_size;
if (_size > MinSize && _heap == NULL)
{
// allocate _heap and copy contents from stack
// _stack is unused and wasted memory
}
else if (_heap != NULL)
{
// we already allocated _heap, append to it, re-allocate if needed
}
else
{
// still got room on stack, append to _stack
}
}
void Pop()
{
--_size;
if (_size <= MinSize && _heap != NULL)
{
// no need for _heap anymore
// copy values to _stack, de-allocate _heap
}
else if (_heap != NULL)
{
// pop from heap
}
else
{
// pop from stack
}
}
private:
T _stack[MinSize];
T* _heap;
int _size;
};
As you can see, _stack is simply wasted space when the buffer grows beyond MinSize. Also, push and pop can be especially costly if Buffer is large enough. Another solution was to keep the first few elements always on stack, and put the overflow on heap. But that would mean the Buffer could not be 'converted' to a simple array.
Is there a better solution? If this is done in std::string, could anybody point out how or provide some resources?

I would suggest you use a pointer _data instead of _heap, which always refers to your data store. _heap == NULL would become _data == _stack and so on, but in all situations which don't chanmge the length of the data, you could avoid the case distinction.
Your current sketch doesn't include a _capacity member to keep track of the currently allocated space. YOu'll need that to implement the “append to it, re-allocate if needed” part, unless you want to reallocate for each and every length change of a heap-allocated container.
You might also consider not freeing the heap space the moment your data fits onto the stack. Otherwise there might be applications adding and removing a single element just at that boundary, causing an allocation each time. So either implement some hysteresis or don't free the heap space at all once you've allocated it. In general I'd say freeing heap memory should go together with shrinking heap memory. Both of these you might want to do either automatically, in response to a certain function call like shrink_to_fit, or not at all, but there is little point in doing one but not the other in a similar situation.
Apart from this, I believe your solution is pretty much all you can hope for. Perhaps provide a default value for MinSize. If MinSize is small, to avoid stack overflows, then wasting that space isn't going to be much of a problem, is it?
Of course, in the end it all depends on your actual application, as a lot of unused stack allocations of this form might have an adverse impact e.g. on the caching of stack memory. Given the fact that default allocators can be pretty smart as well, you probably should benchmark whether you actually gain anything from this optimization, for a given application.

I am not convinced that you need a new data structure here. It seems to me that you really want is a new allocator, to be used with whatever structure you think is best.
In C++03, this would have been relatively difficult, however C++11 now requires that STL containers work with stateful allocators, so you could perfectly create an allocator with a small stack for its own use... and use that as an argument to std::vector<>.
Example (using template aliases)
template <typename T, size_t N = 8>
using SmallVector = std::vector<T, SmallAllocator<T, N>>;
This way you'll benefit from all the robustness and optimizations that went into the implementation of std::vector, and you'll just provide the allocation layer, which was the goal initially, it seems.

Vector of vector pointer memory allocation

First I want to say that, I have a vector which has thousand of vectors inside. Each of these inside vectors has thousand of numbers inside. I want to keep memory management safe and memory usage at minimum as much as possible.
I want to ask that if I have a code similiar to below
int size = 10;
vector<vector<double>>* something = new vector<vector<double>>(size);
vector<double>* insideOfSomething;
for(int i = 0; i < size; i++){
insideOfSomething = &(something->at(i));
//...
//do something with insideOfSomething
//...
}
I know that 'something' will be created in heap. What I don't understand is where the vectors are placed, 'insideOfSomething' points? If they are created in stack, then this means that I have a vector pointer, which points a vector in heap, that has vectors inside which are created in stack? (I'm very confused right now.)
If I have a code similiar to the one below;
vector<vector<double>*>* something = new vector<vector<double>*>(size);
vector<double>* insideOfSomething;
for(int i = 0; i < size; i++){
something->at(i) = new vector<double>();
insideOfSomething = something->at(i);
//...
//do something with inside insideOfSomething
//...
}
right know all of my vectors are stored in heap, right?
Which one is more usefull according to the memory management?

You should avoid allocating vectors on the heap and just declare them on the stack since the vector will manage its objects on the heap for you. Anywhere you want to avoid creating a copy you can just use a reference or const reference (which ever is necessary).
vector<vector<double> > something(size);
for(int i = 0; i < size; i++)
{
vector<double> &insideOfSomething = something.at(i);
//use insideOfSomething
}

Let's take a random, simplistic implementation of vector, as I think this will help you.
template <class T, class Alloc>
class vector
{
private:
T* buffer;
std::size_t vector_size;
std::size_t vector_capacity
Alloc alloc;
public:
...
};
In this case, if we write:
vector<int> v;
v.push_back(123);
... the pointer, buffer, the integrals: vector_size and vector_capacity, and the allocator object, alloc, will all be created on the stack (along with allocating any additional memory necessary for structure padding and alignment).
However, vector itself will allocate memory on the heap to which this buffer pointer will store its base address. That will always be on the heap and will contain the actual contents of the vector as we think of them.
This is still more efficient than this:
vector<int>* v = new vector<int>;
v->push_back(123);
...
delete v;
... as this would involve a heap allocation/deallocation for the vector itself (including its data members) in addition to the memory vector itself allocates for its internal contents (the buffer). It also introduces an additional level of indirection.
Now if we have a vector of Somethings (vector of vector or anything else):
vector<Something> v;
Those Something instances are always going to be allocated within a contiguous heap buffer since they would reside in the dynamically allocated memory blocks that vector creates and destroys internally.

In vector<> all data stored in heap
And i think you should simply use
vector< vector<double> > something;

I want to keep memory management safe and memory usage at minimum as much as possible.
Then
vector<vector<double>>* something = new vector<vector<double>>(size);
is already not good. As said in the other answers, vector already has its data on the heap, no need to mess around with new to achieve this. In fact, the objects' location is like
S t a c k H e a p
(vector<double>) sthng[0]
(vector<vector<double>>) sthng (vector<double>) sthng[1]
...
- - - - - -
(double) sthng[0][0]
(double) sthng[0][1]
...
- - - - - -
(double) sthng[1][0]
(double) sthng[1][1]
...
(of course, there is no particular ordering of the blocks on the heap)

Joe and hired777's answers explain that a vector will be allocated on the heap no matter what. I'll try to give some insight on the reason for this.
A vector is a resizeable container. Generally it doubles in size when it reaches capacity which means it needs to be able to allocate more memory than it had already allocated. Hence even when you declare vector inside a function and hence on the stack, internally it's holding a pointer to it's data on the heap and on going out of the function's scope, it's destructor will delete this data from the heap.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js