I have the following code to test memory deallocation using a std::list container:
#include <iostream>
#include <list>
#include <string>
#include <boost/bind.hpp>
/* count of element to put into container
*/
static const unsigned long SIZE = 50000000;
/* element use for test
*/
class Element
{
public:
Element()
: mId(0)
{}
Element( long id )
: mId(id)
{}
virtual ~Element()
{
}
inline long getId() const
{
return this->mId;
}
inline bool operator<( const Element & rightOperand ) const
{
return this->mId < rightOperand.mId;
}
inline bool isEven() const
{
return 0 == ( this->mId & 1 );
}
private:
long mId;
};
typedef std::list< Element > Elements;
int main( int argc, char * argv[] )
{
std::string dummy;
{
Elements elements;
std::cout << "Inserting "<< SIZE << " elements in container" << std::endl;
std::cout << "Please wait..." << std::endl;
/* inserting elements
*/
for( long i=0; i<SIZE; ++i )
{
elements.push_back( i );
}
std::cout << "Size is " << elements.size() << std::endl;
std::getline( std::cin, dummy); // waiting user press enter
/* remove even elements
*/
elements.remove_if( boost::bind( & Element::isEven, _1 ) );
std::cout << "Size is " << elements.size() << std::endl;
std::getline( std::cin, dummy);
}
std::getline( std::cin, dummy);
return 0;
}
Running this code gives me the following memory profile:
It looks like gcc is defering deallocation and in my test program, at the end it has no choice and deallocate memory before going back to command line.
Why deallocation happens so late ?
I've tried with a vector to test another container and the shrink-to-fit tricks works and deallocate freed memory when I expect it.
gcc 4.5.0, linux 2.6.34
Most operating systems (including Linux) only allow processes to allocate quite large chunks of memory, and not very small ones; even if it is possible, it is most likely more expensive to make many small allocations than a few large ones. Generally, the C++ library will acquire large chunks from the operating system, and use it's own heap manager to allocate small pieces of them to the program. The large chunks will usually not be returned to the operating system once they've been divided up like that; they will remain allocated to the process, and will be reused for future allocations.
list allocates memory in small chunks (one per node), and so usually the allocated memory won't be released until the program exits. vector might get its memory as a single large allocation directly from the operating system, in which case it will be released when its deallocated.
What exactly is your graph showing? The destructor of std::list
deallocates all of the memory, so that it can be reused elsewhere in the
program, but deallocation won't necessarily return the memory to the
system, where it can be used by other processes. Historically, at
least, under Unix, once memory has been allocated to a process, it
remains with that process until the process terminates. Newer
algorithms may be able to actually return memory to the OS, but even
then, things like fragmentation may prevent it from doing so—if
you allocate, then free a really large block, it may be returned, but if
you allocate a lot of little blocks (which is what std::list does),
the runtime will in fact allocate large blocks from the OS, which it
parcels out; such large blocks cannot be returned until all small blocks
in them have been freed, and likely won't be returned even then.
It depends on how you're measuring the memory usage. If it's measuring the process memory in use, this is what you might actually expect.
It's quite common for a program to request memory and have that assigned from the controlling environment (such as an operating system) to the process but, when the memory is freed, it doesn't necessarily get taken away from the process. It may be returned to a free pool within the process.
This was the way allocation used to work in the olden days. A call to brk or sbrk would increase the size of the heap by giving more memory to the process. That memory would be added to the arena from which malloc calls were satisfied.
But, free would return the memory to the arena, not necessarily back to the operating system.
I imagine something similar is happening in this case.
Your memory profile is actually the process's address space consumption (the sum of mmap-ed pages, as e.g. given by /proc/self/statm or /proc/self/maps from the point of view of the process itself).
But when a C or C++ function release memory (previously allocated with malloc or new, which are using mmap to get memory from the Linux kernel) using free or delete, it is not given back to the system (using munmap -because that would be too slow or impractical [fragmentation issues] - but just kept as reusable for future malloc or new.
So deallocation did happen when requested by free but the memory is not given back to the system, but kept for future re-use.
If you really wanted the memory to be given back, write your own allocator (above mmap and munmap) but usually it is not worth the effort.
Perhaps using Boehm's GC could help (it is very useful, to avoid bothering about free-ing or delete -ing) if you explicitly call GC_gcollect() (but I am not sure of that), but you really should not care that much.
And your question is not technically related to gcc (it would be the same with another C++ compiler). It is related to malloc and new (i.e. to standard C & C++ libraries) under Linux.
Related
This is the first time I am trying to use std::unique_ptr but I am getting an access violation
when using std::make_unique with large size .
what is the difference in this case and is it possible to catch this type of exceptions in c++ ?
void SmartPointerfunction(std::unique_ptr<int>&Mem, int Size)
{
try
{
/*declare smart pointer */
//Mem = std::unique_ptr<int>(new int[Size]); // using new (No crash)
Mem = std::make_unique<int>(Size); // using make_unique (crash when Size = 10000!!)
/*set values*/
for (int k = 0; k < Size; k++)
{
Mem.get()[k] = k;
}
}
catch(std::exception& e)
{
std::cout << "Exception :" << e.what() << std::endl;
}
}
When you invoke std::make_unique<int>(Size), what you actually did is allocate a memory of size sizeof(int) (commonly 4bytes), and initialize it as a int variable with the number of Size. So the size of the memory you allocated is only a single int, Mem.get()[k] will touch the address which out of boundary.
But out of bounds doesn't mean your program crash immediately. As you may know, the memory address we touch in our program is virtual memory. And let's see the layout of virtual memory addresses.
You can see the memory addresses are divided into several segments (stack, heap, bss, etc). When we request a dynamic memory, the returned address will usually located in heap segment (I use usually because sometimes allocator will use mmap thus the address will located at a memory shared area, which is located between stack and heap but not marked on the diagram).
The dynamic memory we obtained are not contiguous, but heap is a contiguous segment. from the OS's point of view, any access to the heap segment is legal. And this is what the allocator exactly doing. Allocator manages the heap, divides the heap into different blocks. These blocks, some of which are marked "used" and some of which are marked "free". When we request a dynamic memory, the allocator looks for a free block that can hold the size we need, (split it to a small new block if this free block is much larger than we need), marks it as used, and returns its address. If such a free block cannot be found, the allocator will call sbrk to increase the heap.
Even if we access address which out of range, as long as it is within the heap, the OS will regard it as a legal operation. Although it might overwrite data in some used blocks, or write data into a free block. But if the address we try to access is out of the heap, for example, an address greater than program break or an address located in the bss. The OS will regard it as a "segment fault" and crash immediately.
So your program crashing is nothing to do with the parameter of std::make_unique<int>. It just so happens that when you specify 1000, the addresses you access are out of the segment.
std::make_unique<int>(Size);
This doesn't do what you are expecting!
It creates single int and initializes it into value Size!
I'm pretty sure your plan was to do:
auto p = std::make_unique<int[]>(Size)
Note extra brackets. Also not that result type is different. It is not std::unique_ptr<int>, but std::unique_ptr<int[]> and for this type operator[] is provided!
Fixed version, but IMO you should use std::vector.
My understanding is that memory allocated on the free store (the heap) should grow upwards as I allocate additional free store memory; however, when I run my code, occasionally the memory location of the next object allocated on the free store will be a lower value. Is there an error with my code, or could someone please explain how this could occur? Thank you!
int main()
{
int* a = new int(1);
int* b = new int(1);
int* c = new int(1);
int* d = new int(1);
cout << "Free Store Order: " << int(a) << " " << int(b) << " " << int(c) << " " << int(d) << '\n';
// An order I found: 13011104, 12998464, 12998512, 12994240
delete a;
delete b;
delete c;
delete d;
return 0;
}
The main problem with that code is that you are casting int * to int, an operation that may lose precision, and therefore give you incorrect results.
But, aside from that, this statement is a misapprehansion:
My understanding is that memory allocated on the free store (the heap) should grow upwards as I allocate additional free store memory.
There is no guarantee that new will return objects with sequential addresses, even if they're the same size and there have been no previous allocations. A simple allocator may well do that but it is totally free to allocate objects in any manner it wants.
For example, it may allocate in a round robin method from multiple arenas to reduce resource contention. I believe the jemalloc implementation does this (see here), albeit on an per-thread basis.
Or maybe it has three fixed-address 128-byte buffers to hand out for small allocations so that it doesn't have to fiddle about with memory arenas in programs with small and short-lived buffers. That means the first three will be specific addresses outside the arena, while the fourth is "properly" allocated from the arena.
Yes, I know that may seem a contrived situation but I've actually done something similar in an embedded system where, for the vast majority of allocations, there were less than 64 128-byte allocations in flight at any given time.
Using that method means that most allocations were blindingly fast, using a count and bitmap to figure out free space in the fixed buffers, while still being able to handle larger needs (> 128 bytes) and overflows (> 64 allocations).
And deallocations simply detected if you were freeing one of the fixed blocks and marked it free, rather than having to return it to the arena and possibly coalesce it with adjacent free memory sections.
In other words, something like (with suitable locking to prevent contention, of course):
def free(address):
if address is one of the fixed buffers:
set free bit for that buffer to true
return
call realFree(address)
def alloc(size):
if size is greater than 128 or fixed buffer free count is zero:
return realAlloc(size)
find first free fixed buffer
decrement fixed buffer free count
set free bit for that buffer to false
return address of that buffer
The bottom line is that the values returned by new have certain guarantees but ordering is not one of them.
Below is my question and code:
When code run to line 26, the memory obtained by this process does not return to OS?
But, if I delete line 16, the memory will be released correctly?
I know this is not the regular way to use so many small memory blocks, but I was very curious to know the reason.
I have run this program with MALLOC_MMAP_MAX_=1000000 MALLOC_MMAP_THRESHOLD_=1024, but nothing changed.
int i = 0;
std::cout << "waitting for input, you can check current memory" << std::endl;
std::cin >> i;
char** ptr = new char *[1000000];
std::map<int, char *> tMap;
for (unsigned long i = 0; i < 1000000; i ++)
{
ptr[i] = new char[3000];
tMap.insert(make_pair(i, ptr[i])); //line 16
}
std::cout << "waitting for input, you can check current memory" << std::endl;
std::cin >> i;
for (unsigned long i = 0; i < 1000000; i ++)
{
delete []ptr[i];
}
delete []ptr;
std::cout << "waitting for input, you can check current memory" << std::endl;
std::cin >> i; //line 26
return 0;
here are more materials, And I have checked memory of tMap, less than 100M.
1、allocated memory and stop, check memory res:
holds 2.9G memory
2、deallocate memory and stop, check memory res:
holds 2.9G memory
C++ doesn't have garbage collection, so keeping an extra copy of a pointer doesn't stop the memory from being deallocated.
What happens after delete[] ptr[i] is that the map is full of dangling pointers that can no longer be used.
Another thought: What you might see as a memory leak is the fact that the tMap also allocates dynamic memory to store the inserted data. That memory will be released when the map goes out of scope, just after line 27.
1、when code run to line: 26, the memory obtained by this process does not return to OS ?
There's no guarantee any memory will be released by a C++ program to the Operating System just because it's deleted properly by the program. In many C++ runtimes, dynamically allocated memory that's deleted will still be reserved for future use by the same program, and not released to the OS. GCC/Linux is an example of an compiler/runtime-environment where larger allocations are usually done in shared memory segments that can be released to the Operating System before the program terminates, such that the OS or other programs can use them.
2、But, if I delete line: 16, the memory wile be released correctly ?
Line 16 doesn't make any difference to the later deletion/deallocation of the memory at line 22 (which may return it to the pool of dynamic memory that the application may later re-allocate, or actually release it to the OS as mentioned above). It does involve more dynamic allocations for the std::map elements itself though.
Note that the tMap destructor does not itself delete or release the memory in any way. To have memory automatically released - either by pointer-like variables or containers there-of - use smart pointers such as std::shared_ptr or std::unique_ptr (you can google them for information).
This question already has answers here:
Linux Allocator Does Not Release Small Chunks of Memory
(4 answers)
Closed 8 years ago.
When using a very large vector of vectors we've found that part of the memory is not released.
#include <iostream>
#include <vector>
#include <unistd.h>
void foo()
{
std::vector<std::vector<unsigned int> > voxelToPixel;
unsigned int numElem = 1<<27;
voxelToPixel.resize( numElem );
for (unsigned int idx=0; idx < numElem; idx++)
voxelToPixel.at(idx).push_back(idx);
}
int main()
{
foo();
std::cout << "End" << std::endl;
sleep(30);
return 0;
}
That leaves around 4GB of memory hanging until the process ends.
If we change the for line to
for (unsigned int idx=0; idx < numElem; idx++)
voxelToPixel.at(0).push_back(idx);
the memory is released.
Using gcc-4.8 on a linux machine. We've used htop to track the memory usage on a computer with 100 GB of RAM. You will need around 8 GB of RAM to run the code. Can you reproduce the problem? Any ideas on why that is happening?
EDIT:
We've seen that that does not happen in a Mac (with either gcc or clang). Also, in linux, the memory is freed if we call foo two times (but happens again the third time).
Small allocations (up to 128kb by default, I think) are managed by an in-process heap, and are not returned to the OS when they're deallocated; they're returned to the heap for reuse within the process. Larger allocations come directly from the OS (by calling mmap), and are returned to the OS when deallocated.
In your first example, each vector only needs to allocate enough space for a single int. You have a hundred million small allocations, none of which will be returned to the OS.
In the second example, as the vector grows, it will make many allocations of various sizes. Some are smaller than the mmap threshold, these will remain in the process memory; but, since you only do this to one vector, that won't be a huge amount. If you were to use resize or reserve to allocate all the memory for each vector before populating it, then you should find that all the memory is returned to the OS.
I've been having trouble with a memory leak in a large-scale project I've been working on, but the project has no leaks according to the VS2010 memory checker (and I've checked everything extensively).
I decided to write a simple test program to see if the leak would occur on a smaller scale.
struct TestStruct
{
std::string x[100];
};
class TestClass
{
public:
std::vector<TestStruct*> testA;
//TestStruct** testA;
TestStruct xxx[100];
TestClass()
{
testA.resize(100, NULL);
//testA = new TestStruct*[100];
for(unsigned int a = 0; a < 100; ++a)
{
testA[a] = new TestStruct;
}
}
~TestClass()
{
for(unsigned int a = 0; a < 100; ++a)
{
delete testA[a];
}
//delete [] testA;
testA.clear();
}
};
int _tmain(int argc, _TCHAR* argv[])
{
_CrtSetDbgFlag ( _CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF );
char inp;
std::cin >> inp;
{
TestClass ttt[2];
TestClass* bbbb = new TestClass[2];
std::cin >> inp;
delete [] bbbb;
}
std::cin >> inp;
std::cin >> inp;
return 0;
}
Using this code, the program starts at about 1 meg of memory, goes up to more than 8 meg, then at the end drops down to 1.5 meg. Where does the additional .5 meg go? I am having a similar problem with a particle system but on the scale of hundreds of megabytes.
I cannot for the life of me figure out what is wrong.
As an aside, using the array (which I commented out) greatly reduces the wasted memory, but does not completely reduce it. I would expect for the memory usage to be the same at the last cin as the first.
I am using the task manager to monitor memory usage.
Thanks.
"I cannot for the life of me figure out what is wrong."
Probably nothing.
"[Program] still uses more memory at program end after destroying all objects."
You should not really care about memory usage at program end. Any modern operating system cares about "freeing" all memory associated with a process, when the process ends. (Technically speaking, the address space of the process is simply released.)
Freeing memory at program end can actually slow down the termination of your program, since it unnecessarily needs to access memory pages which may even lie on swap space.
That additional 0.5MB probably remains at your allocator (malloc/free, new/delete, std::allocator). These allocators usually work in a way that they request memory from the operating system when necessary, and give memory back the OS when convenient. Fragmentation could be one of the reasons why the allocator has to hold more memory than strictly required at a moment in time. It is also usually faster to keep some memory in reserve, since requesting memory from the operating system is slow.
"I am using the task manager to monitor memory usage."
Measuring memory usage is in fact more sophisticated than observing a single number, and it requires good understanding of virtual memory and the memory management between a process and the operating system. Unfortunately I cannot recommend any good tools for Windows.
Overall, I think there is no issue with your simple program.