Suppose I'm writing a simple buffer class. This buffer would act as a simple wrapper for a standard C array of objects. It should also be backwards-compatible to work with existing functions that take simple arrays as input.
The goal here is to make this buffer efficient in both speed and memory usage. Since stack allocation is always faster than heap, I want to allocate everything on a stack to a certain threshold, and if it grows larger, re-allocate on the heap. How can this be done efficiently?
I researched, and apparently std::string does something similar. I'm just not sure how. The closest solution I've had has been something along the lines of (pseudo-code, not compiled):
template <typename T, int MinSize>
class Buffer
{
public:
void Push(const T& t)
{
++_size;
if (_size > MinSize && _heap == NULL)
{
// allocate _heap and copy contents from stack
// _stack is unused and wasted memory
}
else if (_heap != NULL)
{
// we already allocated _heap, append to it, re-allocate if needed
}
else
{
// still got room on stack, append to _stack
}
}
void Pop()
{
--_size;
if (_size <= MinSize && _heap != NULL)
{
// no need for _heap anymore
// copy values to _stack, de-allocate _heap
}
else if (_heap != NULL)
{
// pop from heap
}
else
{
// pop from stack
}
}
private:
T _stack[MinSize];
T* _heap;
int _size;
};
As you can see, _stack is simply wasted space when the buffer grows beyond MinSize. Also, push and pop can be especially costly if Buffer is large enough. Another solution was to keep the first few elements always on stack, and put the overflow on heap. But that would mean the Buffer could not be 'converted' to a simple array.
Is there a better solution? If this is done in std::string, could anybody point out how or provide some resources?
I would suggest you use a pointer _data instead of _heap, which always refers to your data store. _heap == NULL would become _data == _stack and so on, but in all situations which don't chanmge the length of the data, you could avoid the case distinction.
Your current sketch doesn't include a _capacity member to keep track of the currently allocated space. YOu'll need that to implement the “append to it, re-allocate if needed” part, unless you want to reallocate for each and every length change of a heap-allocated container.
You might also consider not freeing the heap space the moment your data fits onto the stack. Otherwise there might be applications adding and removing a single element just at that boundary, causing an allocation each time. So either implement some hysteresis or don't free the heap space at all once you've allocated it. In general I'd say freeing heap memory should go together with shrinking heap memory. Both of these you might want to do either automatically, in response to a certain function call like shrink_to_fit, or not at all, but there is little point in doing one but not the other in a similar situation.
Apart from this, I believe your solution is pretty much all you can hope for. Perhaps provide a default value for MinSize. If MinSize is small, to avoid stack overflows, then wasting that space isn't going to be much of a problem, is it?
Of course, in the end it all depends on your actual application, as a lot of unused stack allocations of this form might have an adverse impact e.g. on the caching of stack memory. Given the fact that default allocators can be pretty smart as well, you probably should benchmark whether you actually gain anything from this optimization, for a given application.
I am not convinced that you need a new data structure here. It seems to me that you really want is a new allocator, to be used with whatever structure you think is best.
In C++03, this would have been relatively difficult, however C++11 now requires that STL containers work with stateful allocators, so you could perfectly create an allocator with a small stack for its own use... and use that as an argument to std::vector<>.
Example (using template aliases)
template <typename T, size_t N = 8>
using SmallVector = std::vector<T, SmallAllocator<T, N>>;
This way you'll benefit from all the robustness and optimizations that went into the implementation of std::vector, and you'll just provide the allocation layer, which was the goal initially, it seems.
Related
I have identified a bottleneck in my c++ code, and my goal is to speed it up. I am moving items from one vector to another vector if a condition is true.
In python, the pythonic way of doing this would be to use a list comprehension:
my_vector = [x for x in data_vector if x > 1]
I have hacked a way to do this in C++, and it is working fine. However, I am calling this millions of times in a while-loop and it is slow. I do not understand much about memory allocation, but I assume that my problem has to do with allocating memory over-and-over again using push_back. Is there a way to allocate my memory differently to speed up this code? (I do not know how large my_vector should be until the for-loop has completed).
std::vector<float> data_vector;
// Put a bunch of floats into data_vector
std::vector<float> my_vector;
while (some_condition_is_true) {
my_vector.clear();
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Use my_vector to render graphics on the GPU, but do not change the elements of my_vector
// Change the elements of data_vector, but not the size of data_vector
}
Use std::copy_if, and reserve data_vector.size() for my_vector initially (as this is the maximum possible number of elements for which your predicate could evaluate to true):
std::vector<int> my_vec;
my_vec.reserve(data_vec.size());
std::copy_if(data_vec.begin(), data_vec.end(), std::back_inserter(my_vec),
[](const auto& el) { return el > 1; });
Note that you could avoid the reserve call here if you expect that the number of times that your predicate evaluates to true will be much less than the size of the data_vector.
Though there are various great solutions posted by others for your query, it seems there is still no much explanation for the memory allocation, which you do not much understand, so I would like to share my knowledge about this topic with you. Hope this helps.
Firstly, in C++, there are several types of memory: stack, heap, data segment.
Stack is for local variables. There are some important features associated with it, for example, they will be automatically deallocated, operation on it is very fast, its size is OS-dependent and small such that storing some KB of data in the stack may cause an overflow of memory, et cetera.
Heap's memory can be accessed globally. As for its important features, we have, its size can be dynamically extended if needed and its size is larger(much larger than stack), operation on it is slower than stack, manual deallocation of memory is needed (in nowadays's OS, the memory will be automatically freed in the end of program), et cetera.
Data segment is for global and static variables. In fact, this piece of memory can be divided into even smaller parts, e.g. BBS.
In your case, vector is used. In fact, the elements of vector are stored into its internal dynamic array, that is an internal array with a dynamic array size. In the early C++, a dynamic array can be created on the stack memory, however, it is no longer that case. To create a dynamic array, ones have to create it on heap. Therefore, the elements of vector are stored in an internal dynamic array on heap. In fact, to dynamically increase the size of an array, a process namely memory reallocation is needed. However, if a vector user keeps enlarging his or her vector, then the overhead cost of reallocation cost will be high. To deal with it, a vector would firstly allocate a piece of memory that is larger than the current need, that is allocating memory for potential future use. Therefore, in your code, it is not that case that memory reallocation is performed every time push_back() is called. However, if the vector to be copied is quite large, the memory reserved for future use will be not enough. Then, memory allocation will occur. To tackle it, vector.reserve() may be used.
I am a newbie. Hopefully, I have not made any mistake in my sharing.
Hope this helps.
Run the code twice, first time only counting, how many new elements you will need. Then use reserve to already allocate all the memory you need.
while (some_condition_is_true) {
my_vector.clear();
int newLength = 0;
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
newLength++;
my_vector.reserve(newLength);
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Do stuff with my_vector and change data_vector
}
I doubt allocating my_vector is the problem, especially if the while loop is executed many times as the capacity of my_vector should quickly become sufficient.
But to be sure you can just reserve capacity in my_vector corresponding to the size of data_vector:
my_vector.reserve(data_vector.size());
while (some_condition_is_true) {
my_vector.clear();
for (auto value : data_vector) {
if (value > 1)
my_vector.push_back(value);
}
}
If you are on Linux you can reserve memory for my_vector to prevent std::vector reallocations which is bottleneck in your case. Note that reserve will not waste memory due to overcommit, so any rough upper estimate for reserve value will fit your needs. In your case the size of data_vector will be enough. This line of code before while loop should fix the bottleneck:
my_vector.reserve(data_vector.size());
I have a
priority_queue<node*, std::vector<node*>, CompareNodes> heap;
Let's say the node consists of:
class node {
public:
int value;
int key;
int order = 1000000;
};
How do I free the memory after i'm done with the priority queue?
My approach doesn't seem to be working:
while (heap.top()) {
node * t = heap.top();
heap.pop();
delete t;
}
Looks like you'll want to do something more like this:
while (!heap.empty())
{ /* the rest ... */ }
If the heap is empty, .top() will throw an exception because there's nothing to return, which will happen when you are popping elements.
Also, if available you should use
priority_queue<std::unique_ptr<node>, std::vector<std::unique_ptr<node>>, CompareNodes> heap;
so you don't have to worry about clearing the memory yourself.
Just like most std:: containers, the memory may or may not be freed when you want it to be. Memory is usually kept around for a longer time so that when you perform a heap.push or equivalent operation, the memory doesn't need to be allocated again.
Think of std::vector which has to allocate a new set of memory for the entire vector each time it grows (vector data must be contiguous in memory). It is more efficient for std::vector to perform a large one time allocation and keep the memory around so that the growth operation doesn't kill performance -- a) allocate new space big enough, b) copy entire contents of existing vector to new vector space, c) delete the old vector space.
Bottom line is you can't force it to free memory for individual items.
I just noticed that QList doesn't have a resize method, while QVector, for example, has one. Why is this? And is there an equivalent function?
Well, this is the more generic answer, but I hope you will see, by comparising QList and QVector why there is no need of manually expanding the container.
QList is using internal buffer to save pointers to the elements (or, if the element is smaller than pointer size, or element is one of the shared classes - elements itself), and the real data will be kept on the heap.
During the time, removing the data will not reduce internal buffer (empty space will be filled by shifting left or right elements, leaving space on the beginning and the end for later insertions).
Appending items, like QVector will create additional new space on end of the array, and since, unlike QVector, real data is not stored in internal buffer, you can create a lot of space in single instruction, no matter what size of the item is (unlike QVector) - because you are simply adding pointers into indexing buffer.
For example, if you are using 32bit system (4 bytes per pointer) and you are storing 50 items in the QList, and each item is 1MB big, QVector buffer will need to be resized to 50MB, and QList's internal buffer is need to allocate only 200B of memory. This is where you need to call resize() in QVector, but in QList there is no need, since allocating small chunk of memory is not problematic, as allocating 50MB of memory.
However, there is a price for that which means that you sometimes you want to preffer QVector instead of QList: For single item stored in the QList, you need one additional alloc on the heap - to keep the real data of the item (data where pointer in the internal buffer is pointing to). If you want to add 10000 items larger than the pointer (because, if it can fit into pointer, it will be stored directly in the internal buffer), you will need 10000 system calls to allocate data for 10000 items on the heap. But, if you are using QVector, and you call resize, you are able to fit all the items in the single alloc call - so don't use QList if you need a lot of inserting or appending, prefer QVector for that. Of course, if you are using QList to store shared classes, there is no need for additional allocating, which again makes QList more suitable.
So, prefer QList for most of the cases as it is:
Using indices to access the individual elements, accessing items will be faster that QLinkedList
Inserting into middle of the list will only require moving of the pointers to create space, and it is faster than shifting actual QVector data around.
There is no need to manually reserve or resize space, as empty space will be moved to the end of the buffer for later use, and allocating space in the array is very fast, as the elements are very small, and it can allocate a lot of space without killing your memory space.
Don't use it in the following scenarios, and prefer QVector:
If you need to ensure that your data is stored in the sequential memory locations
If you are rarely inserting data at the random positions, but you are appending a lot of data
at the end or beginning, which can cause a lot of unnecessary system calls, and you still need fast indexing.
If you are looking for (shared) replacement for simple arrays which will not grow over the time.
And, finally, note: QList (and QVector) have reserve(int alloc) function which will cause QList's internal buffer to grow if alloc is greater than the current size of the internal buffer. However, this will not affect external size of the QList (size() will always return the exact number of elements contained in the list).
I think reason is because QList doesn't require the element type to have a default constructor.
As a result of this, there is no operation where QList ever creates an object it only copies them.
But if you really need to resize a QList (for whatever reason), here's a function that will do it. Note that it's just a convenience function, and it's not written with performance in mind.
template<class T>
void resizeList(QList<T> & list, int newSize) {
int diff = newSize - list.size();
T t;
if (diff > 0) {
list.reserve(newSize);
while (diff--) list.append(t);
} else if (diff < 0) list.erase(list.end() + diff, list.end());
}
wasle answer is good, but it'll add the same object multiple time. Here is an utility functions that will add different object for list of smart pointers.
template<class T>
void resizeSmartList(QList<QSharedPointer<T> > & list, int newSize) {
int diff = newSize - list.size();
if (diff > 0) {
list.reserve(diff);
while (diff>0){
QSharedPointer<T> t = QSharedPointer<T>(new T);
list.append(t);
diff--;
}
}else if (diff < 0) list.erase(list.end() + diff, list.end());
}
For use without smart pointers, the following will add different objects to your list.
template<class T>
void resizeList(QList<T> & list, int newSize) {
int diff = newSize - list.size();
if (diff > 0) {
list.reserve(diff);
while (diff>0){
T t = new T;
list.append(t);
diff--;
}
}else if (diff < 0) list.erase(list.end() + diff, list.end());
}
Also remember that your objects must have default constructor (constructor declared in the header with arg="someValue") or else it will fail.
Just use something like
QList<Smth> myList;
// ... some operations on the list here
myList << QVector<Smth>(desiredNewSize - myList.size()).toList();
Essentially, there are these to/from Vector/List/Set() methods everywhere, which makes it trivial to resize Qt containers when necessary in a somewhat manual, but trivial and effective (I believe) way.
Another (1 or 2-liner) solution would be:
myList.reserve(newListSize); // note, how we have to reserve manually
std::fill_n(std::back_inserter(myList), desiredNewSize - myList.size(), Smth());
-- that's for STL-oriented folks :)
For some background on how complex an effective QList::resize() may get, see:
bugreports.qt.io/browse/QTBUG-42732 , and
codereview.qt-project.org/#/c/100738/1//ALL
Why is it not possible to get the length of a buffer allocated in this fashion.
AType * pArr = new AType[nVariable];
When the same array is deallocated
delete [] pArr;
the runtime must know how much to deallocate. Is there any means to access the length before deleting the array. If no, why no such API is provided that will fetch the length?
Is there any means to access the length before deleting the array?
No. there is no way to determine that.
The standard does not require the implementation to remember and provide the specifics of the number of elements requested through new.
The implementation may simply insert specific bit patterns at end of allocated memory blocks instead of remembering the number of elements, and might simply lookup for the pattern while freeing the memory.
In short it is solely an imlpementation detail.
On a side note, There are 2 options to practically overcome this problem:
You can simple use a std::vector which provides you member functions like size() or
You can simply do the bookkeeping yourself.
new atleast allocates enough memory as much as you requested.
You already know how much memory you requested so you can calculate the length easily. You can find size of each item using sizeof.
Total memory requested / Memory required for 1 item = No of Items
The runtime DOES know how much was allocated. However such details are compiler specific so you don't have any cross platform way to handle it.
If you would like the same functionality and be able to track the size you could use a std::vector as follows:
std::vector< AType > pArr( nVariable );
This has the added advantage of using RAII as well.
The delete operator doesn't need to know the size to free the allocated memory, just like the free system call doesn't. This is because that problem is left to the operating system and not the compilers runtime system.
The runtime must deallocate the same amount as it allocated, and it does
keep track of this in some manner (usually very indirectly). But
there's no reliable way of getting from amount allocated to number of
elements: the amount allocated cannot be less than the number of
elements times the size of each element, but it will often be more.
Alignment considerations, for example, mean that new char[5] and new
char[8] will often allocate the same amount of memory, and there are
various allocation strategies which can cause significantly more memory
to be allocated that what is strictly necessary.
No, not really. At least not in a platform-independent, defined way.
Most implementations store the size of a dynamically allocated array before the actual array though.
There is no portable way in C++ to get the size of a dynamically allocated array from the raw pointer.
Under MSVC and WIN32 you can get the size of the allocated block with the _msize(void*) function.
see https://msdn.microsoft.com/en-us/library/z2s077bc.aspx for further details.
I use this "dirty" method, only for debugging purpose:
T *p = new T[count];
size_t size = (char*)&(p[count]) - (char*)p;
This gives the size of real data but not any extra size that could has been allocated by the compiler.
For already aligned types T, it is equal to:
size_t size = sizeof(T) * count;
Of course this doesn't works if you don't know the count of items in array.
why not a bit of extra info like this:
template <typename T> class AType
{
public:
AType(size_t s) : data(0)
{
a_size = s;
data = new T[s];
}
~AType() {
if (data != nullptr)
delete [] data;
}
size_t getSize() const
{
return a_size * sizeof(T);
}
private:
size_t a_size;
T* data;
};
Hy all,
I believe that the following piece of code is generating memory leak?
/* External function to dynamically allocate a vector */
template <class T>
T *dvector(int n){
T *v;
v = (T *)malloc(n*sizeof(T));
return v;
}
/* Function that calls DVECTOR and, after computation, frees it */
void DiscontinuousGalerkin_Domain::computeFaceInviscidFluxes(){
int e,f,n,p;
double *Left_Conserved;
Left_Conserved = dvector<double>(NumberOfProperties);
//do stuff with Left_Conserved
//
free(Left_Conserved);
return;
}
I thought that, by passing the pointer to DVECTOR, it would allocate it and return the correct address, so that free(Left_Conserved) would successfully deallocate. However, it does not seem to be the case.
NOTE: I have also tested with new/delete replacing malloc/free without success either.
I have a similar piece of code for allocating a 2-D array. I decided to manage vectors/arrays like that because I am using them a lot, and I also would like to understand a bit deeper memory management with C++.
So, I would pretty much like to keep an external function to allocate vectors and arrays for me. What's the catch here to avoid the memory leak?
EDIT
I have been using the DVECTOR function to allocate user-defined types as well, so that is potentially a problem, I guess, since I don't have constructors being called.
Even though in the piece of code before I free the Left_Conserved vector, I also would like to otherwise allocate a vector and left it "open" to be assessed through its pointer by other functions. If using BOOST, it will automatically clean the allocation upon the end of the function, so, I won't get a "public" array with BOOST, right? I suppose that's easily fixed with NEW, but what would be the better way for a matrix?
It has just occurred me that I pass the pointer as an argument to other functions. Now, BOOST seems not to be enjoying it that much and compilation exits with errors.
So, I stand still with the need for a pointer to a vector or a matrix, that accepts user-defined types, that will be passed as an argument to other functions. The vector (or matrix) would most likely be allocated in an external function, and freed in another suitable function. (I just wouldn't like to be copying the loop and new stuff for allocating the matrix everywhere in the code!)
Here is what I'd like to do:
template <class T>
T **dmatrix(int m, int n){
T **A;
A = (T **)malloc(m*sizeof(T *));
A[0] = (T *)malloc(m*n*sizeof(T));
for(int i=1;i<m;i++){
A[i] = A[i-1]+n;
}
return A;
}
void Element::setElement(int Ptot, int Qtot){
double **MassMatrix;
MassMatrix = dmatrix<myT>(Ptot,Qtot);
FillInTheMatrix(MassMatrix);
return;
}
There is no memory leak there, but you should use new/delete[] instead of malloc/free. Especially since your function is templated.
If you ever want to to use a type which has a non-trivial constructor, your malloc based function is broken since it doesn't call any constructors.
I'd replace "dvector" with simply doing this:
void DiscontinuousGalerkin_Domain::computeFaceInviscidFluxes(){
double *Left_Conserved = new double[NumberOfProperties];
//do stuff with Left_Conserved
//
delete[] Left_Conserved;
}
It is functionally equivalent (except it can call constructors for other types). It is simpler and requires less code. Plus every c++ programmer will instantly know what is going on since it doesn't involve an extra function.
Better yet, use smart pointers to completely avoid memory leaks:
void DiscontinuousGalerkin_Domain::computeFaceInviscidFluxes(){
boost::scoped_array<double> Left_Conserved(new double[NumberOfProperties]);
//do stuff with Left_Conserved
//
}
As many smart programmers like to say "the best code is the code you don't have to write"
EDIT: Why do you believe that the code you posted leaks memory?
EDIT: I saw your comment to another post saying
At code execution command top shows
allocated memory growing
indefinitely!
This may be completely normal (or may not be) depending on your allocation pattern. Usually the way heaps work is that they often grow, but don't often shrink (this is to favor subsequent allocations). Completely symmetric allocations and frees should allow the application to stabilize at a certain amount of usage.
For example:
while(1) {
free(malloc(100));
}
shouldn't result in continuous growth because the heap is highly likely to give the same block for each malloc.
So my question to you is. Does it grow "indefinitely" or does it simply not shrink?
EDIT:
You have asked what to do about a 2D array. Personally, I would use a class to wrap the details. I'd either use a library (I believe boost has a n-dimmentional array class), or rolling your own shouldn't be too hard. Something like this may be sufficient:
http://www.codef00.com/code/matrix.h
Usage goes like this:
Matrix<int> m(2, 3);
m[1][2] = 10;
It is technically more efficient to use something like operator() for indexing a matrix wrapper class, but in this case I chose to simulate native array syntax. If efficiency is really important, it can be made as efficient as native arrays.
EDIT: another question. What platform are you developing on? If it is *nix, then I would recommend valgrind to help pinpoint your memory leak. Since the code you've provided is clearly not the problem.
I don't know of any, but I am sure that windows also has memory profiling tools.
EDIT: for a matrix if you insist on using plain old arrays, why not just allocate it as a single contiguous block and do simple math on indexing like this:
T *const p = new T[width * height];
then to access an element, just do this:
p[y * width + x] = whatever;
this way you do a delete[] on the pointer whether it is a 1D or 2D array.
There is no visible memory leak, however there is a high risk for a memory leak with code like this. Try to always wrap up resources in an object (RAII).
std::vector does exactly what you want :
void DiscontinuousGalerkin_Domain::computeFaceInviscidFluxes(){
int e,f,n,p;
std::vector<double> Left_Conserved(NumOfProperties);//create vector with "NumOfProperties" initial entries
//do stuff with Left_Conserved
//exactly same usage !
for (int i = 0; i < NumOfProperties; i++){//loop should be "for (int i = 0; i < Left_Conserved.size();i++)" .size() == NumOfProperties ! (if you didn't add or remove any elements since construction
Left_Conserved[i] = e*f + n*p*i;//made up operation
}
Left_Conserved.push_back(1.0);//vector automatically grows..no need to manually realloc
assert(Left_Conserved.size() == NumOfProperties + 1);//yay - vector knows it's size
//you don't have to care about the memory, the Left_Conserved OBJECT will clean it up (in the destructor which is automatically called when scope is left)
return;
}
EDIT: added a few example operations. You really should read about STL-containers, they are worth it !
EDIT 2 : for 2d you can use :
std::vector<std::vector<double> >
like someone suggested in the comments. but usage with 2d is a little more tricky. You should first look into the 1d-case to know what's happening (enlarging vectors etc.)
No, as long as you aren't doing anything drastic between the call to your dvector template and the free, you aren't leaking any memory. What tells you there is a memory leak?
May I ask, why you chose to create your own arrays instead of using STL containers like vector or list? That'd certainly save you a lot of trobule.
I don't see memory leak in this code.
If you write programs on c++ - use new/delete instead malloc/free.
For avoid possible memory leaks use smart pointers or stl containers.
What happens if you pass a negative value for n to dvector?
Perhaps you should consider changing your function signature to take an unsigned type as the argument:
template< typename T >
T * dvector( std::size_t n );
Also, as a matter of style, I suggest always providing your own memory release function any time you are providing a memory allocation function. As it is now, callers rely on knowledge that dvector is implemented using malloc (and that free is the appropriate release call). Something like this:
template< typename T >
void dvector_free( T * p ) { free( p ); }
As others have suggested, doing this as an RAII class would be more robust. And finally, as others have also suggested, there are plenty of existing, time-tested libraries to do this so you may not need to roll your own at all.
So, some important concepts discussed here helped me to solve the memory leaking out in my code. There were two main bugs:
The allocation with malloc of my user-defined types was buggy. However, when I changed it to new, leaking got even worse, and that's because one my user-defined types had a constructor calling an external function with no parameters and no correct memory management. Since I called that function after the constructor, there was no bug in the processing itself, but only on memory allocation. So new and a correct constructor solved one of the main memory leaks.
The other leaking was related to a buggy memory-deallocation command, which I was able to isolate with Valgrind (and a bit a patience to get its output correctly). So, here's the bug (and, please, don't call me a moron!):
if (something){
//do stuff
return; //and here it is!!! =P
}
free();
return;
And that's where an RAII, as I understood, would avoid misprogramming just like that. I haven't actually changed it to a std::vector or a boost::scoped_array coding yet because it is still not clear to me if a can pass them as parameter to other functions. So, I still must be careful with delete[].
Anyway, memory leaking is gone (by now...) =D