Memory leakage when creating object in a loop - c++

I am new to C++ and memory management. I have a code that is to build up a graph composed of objects of type vertex(~100 bytes each) and edge(~ 50 bytes each). My code works fine when the graph is small, but with the real data that has ~ 3M vertexes and ~ 10M edges, I get the run time error: std::bad_alloc when "new" is used (and not always with the same new).
This, based on what I have gathered, is the effect of memory leakage in my program that makes new memory allocation fail. My question is what is wrong with the way I am allocating memory and more importantly how can I fix it. Here is roughly what I do:
In the graph class constructor I create the array repository for the objects of class vertex:
graph:graph()
{
// vertexes is a class varaible
vertexes = new vertex *[MAX_AR_LEN];// where MAX_AR_LEN = 3M
}
I then call a function like this to iteratively build obj vertexes and assign them to array.
void graph::buildVertexes()
{
for(int i=0; i<v_num; i++)
vertexes[i] = new vertex(strName);
}
I then complete other tasks and at the end before the program ends I have a destructor that explicitly deletes the graph object
graph:~graph()
{
delete[] vertexes;
vertexes = 0;
}
Where is the leak happening. I am creating a lot of objects but nothing to my knowledge that could be deleted and remains undeleted.
I have been dealing with this for over a week now with not much luck. Thank you very much for your help!
EDIT (after solving the issue):
Thanks all for the help. Looking back, with the info I provided it is hard to pinpoint what was going on. I solved the issues and here are the very obvious points that I took away; so obvious that might not worth sharing, but here they are anyway:
When dealing with lots of objects that need to exist on the memory simultaneously, before coding use your best estimation to find the minimal memory you need. In my case even without a leakage, I would have almost maxed out on memory. I just needed better estimates of memory use to figure that out.
As you go along developing your code, frequently using vld.h (or other alternatives) can be helpful in checking that your design is free of memory leakage. Doing this at the end could be a lot more complicated, and even if you find the leakage, it might be harder to fix.
Let’s say you did all these and you expect to have enough memory to run the code but you get std::bad_alloc run time error when there seems to be plenty of free memory available on your system. You might be compiling for 32 bit platform, switching to 64 bit will allow allocation of more memory from what’s available (for visual studio: ).
Use of vectors instead of arrays as suggested by many here is a helpful approach to avoid a common route for leakage (and for other conveniences), but let’s say you have memory leakage and you have arrays. As arrays are not necessarily the cause of leakage (obviously), switching to vectors might not serve you. Looking at array deletion though is a good start. Here is what I gathered for how to properly delete an array of pointers to objects:
//Let's say we have
objType **objAr = new objType[ aNum];
for(int i=0; i<objNum; i++)
{
ObjAr[i] = new objType();
}
// to delete:
for(int i=0; i<objNum; i++)
{
delete objAr[i];
}
// If instead of array of pointers we had just
// an array of objects loop wasn't needed
delete [] objAr;
objAr = 0;
Ironically, a source of leakage in my code was improper deletion of a vector of pointers to objects. For vectors I needed to first delete element by element and then do a vec.clear(). Just doing the latter was causing memory leakage.

Look how many times you use new. You use it once to allocate the array of pointers (new vertex *[MAX_AR_LEN]) and then you use it v_num times to allocate each vertex. To avoid memory leaks, you have to use delete the same number of times you use new, so that you deallocate everything that you allocated.
You're going to have to loop through your array of pointers and do delete vertexes[i] on each one.
However, if you had used a std::vector<vertex>, you would not have to deal with this manual memory allocation and would avoid these kinds of problems.
Note that the plural of "vertex" is "vertices"

Related

Reducing memory footprint of c++ program utilising large vectors

In scaling up the problem size I'm handing to a self-coded program I started to bump into Linux's OOM killer. Both Valgrind (when ran on CPU) and cuda-memcheck (when ran on GPU) do not report any memory leaks. The memory usage keeps expanding while iterating through the inner loop, while I explicitly clear the vectors holding the biggest chunk of data at the end of the this loop. How can I ensure this memory hogging will disappear?
Checks for memory leaks were performed, all the memory leaks are fixed. Despite this, Out of Memory errors keep killing the program (via the OOM Killer). Manual monitoring of memory consumption shows an increase in memory utilisation, even after explicitly clearing the vectors containing the data.
Key to know is having three nested loops, one outer containing the sub-problems at hand. The middle loop loops over the Monte Carlo trials, with an inner loop running some sequential process required inside the trial. Pseudo-code looks as follows:
std::vector<object*> sub_problems;
sub_problems.push_back(retrieved_subproblem_from_database);
for(int sub_problem_index = 0; sub_problem_index < sub_problems.size(); ++sub_problem_index){
std::vector< std::vector<float> > mc_results(100000, std::vector<float>(5, 0.0));
for(int mc_trial = 0; mc_trial < 100000; ++mc_trial){
for(int sequential_process_index = 0; sequential_process_index < 5; ++sequential_process_index){
mc_results[mc_trial][sequential_process_index] = specific_result;
}
}
sub_problems[sub_problem_index]->storeResultsInObject(mc_results);
// Do some other things
sub_problems[sub_problem_index]->deleteMCResults();
}
deleteMCResults looks as follows:
bool deleteMCResults() {
for (int i = 0; i < asset_values.size(); ++i){
object_mc_results[i].clear();
object_mc_results[i].shrink_to_fit();
}
object_mc_results.clear();
object_mc_results.shrink_to_fit();
return true;
}
How can I ensure memory consumption to be solely dependent on the middle and inner loop instead of the outer loop? The second, and third and fourth and so, could theoretically use exactly the same memory space/addresses as utilised for the first iteration.
Perhaps I'm reading your pseudocode too literally, but it looks like you have two mc_results variables, one declared inside the for loop and one that deleteMCResults is accessing.
In any case, I have two suggestions for how to debug this. First, rather than letting the OOM killer strike, which takes a long time, is unpredictable, and might kill something important, use ulimit -v to put a limit on process size. Set it to something reasonable like, say, 1000000 (about 1GB) and work on keeping your process under that.
Second, start deleting or commenting out everything except the parts of the program that allocate and deallocate memory. Either you will find your culprit or you will make a program small enough to post in its entirety.
deleteMCResults() can be written a lot simpler.
void deleteMCResults() {
decltype(object_mc_results) empty;
std::swap(object_mc_results, empty);
}
But in this case, I'm wondering if you really want to release the memory. As you say, the iterations could reuse the same memory, so perhaps you should replace deleteMCResults() with returnMCResultsMemory(). Then hoist the declaration of mc_results out of the loop, and just reset its values to 5.0 after returnMCResultsMemory() returns.
There is one thing that could easily be improved from the code you show. However, it is really not enough and not precise enough info to make a full analysis. Extracting a relevant example ([mcve]) and perhaps asking for a review on codereview.stackexchange.com might improve the outcome.
The simple thing that could be done is to replace the inner vector of five floats with an array of five floats. Each vector consists (in typical implementations) of three pointers, to the beginnig and end of the allocated memory and another one to mark the used amount. The actual storage requires a separate allocation, which in turn incurs some overhead (and also performance overhead when accessing the data, keyword "locality of reference"). These three pointers require 24 octets on a common 64-bit machine. Compare that with five floats, those only require 20 octets. Even if those floats were padded to 24 octets, you would still benefit from eliding the separate allocation.
In order to try this out, just replace the inner vector with a std::array (https://en.cppreference.com/w/cpp/container/array). Odds are that you won't have to change much code, raw arrays, std::array and std::vector have very similar interfaces.

Why Copying an instance of object in a loop takes a huge memory in C++?

I have written a program which works iteratively to find some solution. I initially used vectors to have instances of an object. It worked fine but I preferred to have and instance of the class as the primary object and a temp object which is made in a while loop through some kind of instance copying. It works fine but slower. it also occupies almost two times more RAM memory space. (e.g. 980 Mb first and after that change it take about 1.6 Gb) why? I really have no Idea. I Took the line of "copying" (which is not technically copy constructor but works the same way) out of loop and it works as expected with expected RAM usage, so the problem arises when the "copying line" is inside the loop. Any Idea why this happens?
a simplified preview of code:
void SPSA::beginIteration(Model ins, Inventory inv_Data, vector <Result> &res)
{
bool icontinue=true;
while(icontinue)
{
Model ins_temp(&ins, &inv_Data);
if(model_counter>0)
ins_temp.setDecesionVariableIntoModel(decisionVariable);
//something useful here
model_counter++;
}
}
The code above occupies a lot of RAM space.
but code below is ok:
void SPSA::beginIteration(Model ins, Inventory inv_Data, vector <Result> &res)
{
bool icontinue=true;
Model ins_temp(&ins, &inv_Data);
while(icontinue)
{
if(model_counter>0)
ins_temp.setDecesionVariableIntoModel(decisionVariable);
//something useful here
model_counter++;
}
}
By the way, I'm compiling using using mingw++.
Thanks
The main difference is copying the Model N times (once for each loop body execution, depending on when icontinue is set) rather than once.
First, try to reduce the problem:
while(1) Model ins_temp(&ins, &inv_Data);
If that eats memory as well (significantly), yet, it's a problem with Model.
(Above loop may eat a little memory due to fragmentation - depending on how Model is implemented.)
Most likely cause (without additional information) is a memory leak in Model.
Other possibilities include: Model uses a lazy memory release (such as / similar to a garbage collector), Model uses shared pointers and creating more than one instance causes circular references, or you are running into a very ugly, very bad memory fragmentation problem (extremely unlikely).
How to proceed
The "pro" solution would be using a memory profiler (options for mingw).
Alternatively, study your code of Model for leaks, or reduce the implementation of Model until you find the minimal change that makes the leak go away.
Don't be scared immediately if your code appears to use a lof of memory. It doesn't necessarily mean there's a memory leak. C++ may not always return memory to the OS, but keep it on hand for future allocations. It only counts as a memory leak if C++ loses track of the allocation status and cannot use that memory for future allocations anymore.
The fact that the memory doubles regardless of the iteration count suggests that the memory is recycled not on the very first opportunity, but on every second allocation. Presumably you end up with two big allocations which are used alternatingly.
It's even possible that if the second allocation had failed, the first block would have been recycled, so you use 1.6 GB of memory because it's there.

Managing a Contiguous Chunk of Memory without Malloc/New or Free/Delete

How would one go about creating a custom MemoryManager to manage a given, contiguous chunk of memory without the aid of other memory managers (such as Malloc/New) in C++?
Here's some more context:
MemManager::MemManager(void* memory, unsigned char totalsize)
{
Memory = memory;
MemSize = totalsize;
}
I need to be able to allocate and free up blocks of this contiguous memory using a MemManager. The constructor is given the total size of the chunk in bytes.
An Allocate function should take in the amount of memory required in bytes and return a pointer to the start of that block of memory. If no memory is remaining, a NULL pointer is returned.
A Deallocate function should take in the pointer to the block of memory that must be freed and give it back to the MemManager for future use.
Note the following constraints:
-Aside from the chunk of memory given to it, the MemManager cannot use ANY dynamic memory
-As originally specified, the MemManager CANNOT use other memory managers to perform its functions, including new/malloc and delete/free
I have received this question on several job interviews already, but even hours of researching online did not help me and I have failed every time. I have found similar implementations, but they have all either used malloc/new or were general-purpose and requested memory from the OS, which I am not allowed to do.
Note that I am comfortable using malloc/new and free/delete and have little trouble working with them.
I have tried implementations that utilize node objects in a LinkedList fashion that point to the block of memory allocated and state how many bytes were used. However, with those implementations I was always forced to create new nodes onto the stack and insert them into the list, but as soon as they went out of scope the entire program broke since the addresses and memory sizes were lost.
If anyone has some sort of idea of how to implement something like this, I would greatly appreciate it. Thanks in advance!
EDIT: I forgot to directly specify this in my original post, but the objects allocated with this MemManager can be different sizes.
EDIT 2: I ended up using homogenous memory chunks, which was actually very simple to implement thanks to the information provided by the answers below. The exact rules regarding the implementation itself were not specified, so I separated each block into 8 bytes. If the user requested more than 8 bytes, I would be unable to give it, but if the user requested fewer than 8 bytes (but > 0) then I would give extra memory. If the amount of memory passed in was not divisible by 8 then there would be wasted memory at the end, which I suppose is much better than using more memory than you're given.
I have tried implementations that utilize node objects in a LinkedList
fashion that point to the block of memory allocated and state how many
bytes were used. However, with those implementations I was always
forced to create new nodes onto the stack and insert them into the
list, but as soon as they went out of scope the entire program broke
since the addresses and memory sizes were lost.
You're on the right track. You can embed the LinkedList node in the block of memory you're given with reinterpret_cast<>. Since you're allowed to store variables in the memory manager as long as you don't dynamically allocate memory, you can track the head of the list with a member variable. You might need to pay special attention to object size (Are all objects the same size? Is the object size greater than the size of your linked list node?)
Assuming the answers to the previous questions to be true, you can then process the block of memory and split it off into smaller, object sized chunks using a helper linked list that tracks free nodes. Your free node struct will be something like
struct FreeListNode
{
FreeListNode* Next;
};
When allocating, all you do is remove the head node from the free list and return it. Deallocating is just inserting the freed block of memory into the free list. Splitting the block of memory up is just a loop:
// static_cast only needed if constructor takes a void pointer; can't perform pointer arithmetic on void*
char* memoryEnd = static_cast<char*>(memory) + totalSize;
for (char* blockStart = block; blockStart < memoryEnd; blockStart += objectSize)
{
FreeListNode* freeNode = reinterpret_cast<FreeListNode*>(blockStart);
freeNode->Next = freeListHead;
freeListHead = freeNode;
}
As you mentioned the Allocate function takes in the object size, the above will need to be modified to store metadata. You can do this by including the size of the free block in the free list node data. This removes the need to split up the initial block, but introduces complexity in Allocate() and Deallocate(). You'll also need to worry about memory fragmentation, because if you don't have a free block with enough memory to store the requested amount, there's nothing that you can do other than to fail the allocation. A couple of Allocate() algorithms might be:
1) Just return the first available block large enough to hold the request, updating the free block as necessary. This is O(n) in terms of searching the free list, but might not need to search a lot of free blocks and could lead to fragmentation problems down the road.
2) Search the free list for the block that has the smallest amount free in order to hold the memory. This is still O(n) in terms of searching the free list because you have to look at every node to find the least wasteful one, but can help delay fragmentation problems.
Either way, with variable size, you have to store metadata for allocations somewhere as well. If you can't dynamically allocate at all, the best place is before or after the user requested block; you can add features to detect buffer overflows/underflows during Deallocate() if you want to add padding that is initialized to a known value and check the padding for a difference. You can also add a compact step as mentioned in another answer if you want to handle that.
One final note: you'll have to be careful when adding metadata to the FreeListNode helper struct, as the smallest free block size allowed is sizeof(FreeListNode). This is because you are storing the metadata in the free memory block itself. The more metadata you find yourself needing to store for your internal purposes, the more wasteful your memory manager will be.
When you manage memory, you generally want to use the memory you manage to store any metadata you need. If you look at any of the implementations of malloc (ptmalloc, phkmalloc, tcmalloc, etc...), you'll see that this is how they're generally implemented (neglecting any static data of course). The algorithms and structures are very different, for different reasons, but I'll try to give a little insight into what goes into generic memory management.
Managing homogeneous chunks of memory is different than managing non-homogeneous chunks, and it can be a lot simpler. An example...
MemoryManager::MemoryManager() {
this->map = std::bitset<count>();
this->mem = malloc(size * count);
for (int i = 0; i < count; i++)
this->map.set(i);
}
Allocating is a matter of finding the next bit in the std::bitset (compiler might optimize), marking the chunk as allocated and returning it. De-allocation just requires calculating the index, and marking as unallocated. A free list is another way (what's described here), but it's a little less memory efficient, and might not use CPU cache well.
A free list can be the basis for managing non-homogenous chunks of memory though. With this, you need to store the size of the chunks, in addition to the next pointer in the chunk of memory. The size lets you split larger chunks into smaller chunks. This generally leads to fragmentation though, since merging chunks is non-trivial. This is why most data structures keep lists of same sized chunks, and try to map requests as closely as possible.

std::sort on container of pointers

I want to explore the performance differences for multiple dereferencing of data inside a vector of new-ly allocated structs (or classes).
struct Foo
{
int val;
// some variables
}
std::vector<Foo*> vectorOfFoo;
// Foo objects are new-ed and pushed in vectorOfFoo
for (int i=0; i<N; i++)
{
Foo *f = new Foo;
vectorOfFoo.push_back(f);
}
In the parts of the code where I iterate over vector I would like to enhance locality of reference through the many iterator derefencing, for example I have very often to perform a double nested loop
for (vector<Foo*>::iterator iter1 = vectorOfFoo.begin(); iter!=vectorOfFoo.end(); ++iter1)
{
int somevalue = (*iter)->value;
}
Obviously if the pointers inside the vectorOfFoo are very far, I think locality of reference is somewhat lost.
What about the performance if before the loop I sort the vector before iterating on it? Should I have better performance in repeated dereferencings?
Am I ensured that consecutive ´new´ allocates pointer which are close in the memory layout?
Just to answer your last question: no, there is no guarantee whatsoever where new allocates memory. The allocations can be distributed throughout the memory. Depending on the current fragmentation of the memory you may be lucky that they are sometimes close to each other but no guarantee is - or, actually, can be - given.
If you want to improve the locality of reference for your objects then you should look into Pool Allocation.
But that's pointless without profiling.
It depends on many factors.
First, it depends on how your objects that are being pointed to from the vector were allocated. If they were allocated on different pages then you cannot help it but fix the allocation part and/or try to use software prefetching.
You can generally check what virtual addresses malloc gives out, but as a part of the larger program the result of separate allocations is not deterministic. So if you want to control the allocation, you have to do it smarter.
In case of NUMA system, you have to make sure that the memory you are accessing is allocated from the physical memory of the node on which your process is running. Otherwise, no matter what you do, the memory will be coming from the other node and you cannot do much in that case except transfer you program back to its "home" node.
You have to check the stride that is needed in order to jump from one object to another. Pre-fetcher can recognize the stride within 512 byte window. If the stride is greater, you are talking about a random memory access from the pre-fetcher point of view. Then it will shut off not to evict your data from the cache, and the best you can do there is to try and use software prefetching. Which may or may not help (always test it).
So if sorting the vector of pointers makes the objects pointed by them continuously placed one after another with a relatively small stride - then yes, you will improve the memory access speed by making it more friendly for the prefetch hardware.
You also have to make sure that sorting that vector doesn't result in a worse gain/lose ratio.
On a side note, depending on how you use each element, you may want to allocate them all at once and/or split those objects into different smaller structures and iterate over smaller data chunks.
At any rate, you absolutely must measure the performance of the whole application before and after your changes. These sort of optimizations is a tricky business and things can get worse even though in theory the performance should have been improved. There are many tools that can be used to help you profile the memory access. For example, cachegrind. Intel's VTune does the same. And many other tools. So don't guess, experiment and verify the results.

Is this normal behavior for a std::vector?

I have a std::vector of a class called OGLSHAPE.
each shape has a vector of SHAPECONTOUR struct which has a vector of float and a vector of vector of double. it also has a vector of an outline struct which has a vector of float in it.
Initially, my program starts up using 8.7 MB of ram. I noticed that when I started filling these these up, ex adding doubles and floats, the memory got fairly high quickly, then leveled off. When I clear the OGLSHAPE vector, still about 19MB is used. Then if I push about 150 more shapes, then clear those, I'm now using around 19.3MB of ram. I would have thought that logically, if the first time it went from 8.7 to 19, that the next time it would go up to around 30. I'm not sure what it is. I thought it was a memory leak but now I'm not sure. All I do is push numbers into std::vectors, nothing else. So I'd expect to get all my memory back. What could cause this?
Thanks
*edit, okay its memory fragmentation
from allocating lots of small things,
how can that be solved?
Calling std::vector<>::clear() does not necessarily free all allocated memory (it depends on the implementation of the std::vector<>). This is often done for the purpose of optimization to avoid unnessecary memory allocations.
In order to really free the memory held by an instance just do:
template <typename T>
inline void really_free_all_memory(std::vector<T>& to_clear)
{
std::vector<T> v;
v.swap(to_clear);
}
// ...
std::vector<foo> objs;
// ...
// really free instance 'objs'
really_free_all_memory(objs);
which creates a new (empty) instance and swaps it with your vector instance you would like to clear.
Use the correct tools to observe your memory usage, e.g. (on Windows) use Process Explorer and observe Private Bytes. Don't look at Virtual Address Space since that shows the highest memory address in use. Fragmentation is the cause of a big difference between both values.
Also realize that there are a lot of layers in between your application and the operating system:
the std::vector does not necessarily free all memory immediately (see tip of hkaiser)
the C Run Time does not always return all memory to the operating system
the Operating System's Heap routines may not be able to free all memory because it can only free full pages (of 4 KB). If 1 byte of a 4KB page is stil used, the page cannot be freed.
There are a few possible things at play here.
First, the way memory works in most common C and C++ runtime libraries is that once it is allocated to the application from the operating system it is rarely ever given back to the OS. When you free it in your program, the new memory manager keeps it around in case you ask for more memory again. If you do, it gives it back for you for re-use.
The other reason is that vectors themselves typically don't reduce their size, even if you clear() them. They keep the "capacity" that they had at their highest so that it is faster to re-fill them. But if the vector is ever destroyed, that memory will then go back to the runtime library to be allocated again.
So, if you are not destroying your vectors, they may be keeping the memory internally for you. If you are using something in the operating system to view memory usage, it is probably not aware of how much "free" memory is waiting around in the runtime libraries to be used, rather than being given back to the operating system.
The reason your memory usage increases slightly (instead of not at all) is probably because of fragmentation. This is a sort of complicated tangent, but suffice it to say that allocating a lot of small objects can make it harder for the runtime library to find a big chunk when it needs it. In that case, it can't reuse some of the memory it has laying around that you already freed, because it is in lots of small pieces. So it has to go to the OS and request a big piece.