Delete class pointer does not free memory [duplicate] - c++

I need help understanding problems with my memory allocation and deallocation on Windows. I'm using VS11 compiler (VS2012 IDE) with latest update at the moment (Update 3 RC).
Problem is: I'm allocating dynamically some memory for a 2-dimensional array and immediately deallocating it. Still, before memory allocation, my process memory usage is 0,3 MB before allocation, on allocation 259,6 MB (expected since 32768 arrays of 64 bit ints (8bytes) are allocated), 4106,8 MB during allocation, but after deallocation memory does not drop to expected 0,3 MB, but is stuck at 12,7 MB. Since I'm deallocating all heap memory I've taken, I've expected memory to be back to 0,3 MB.
This is the code in C++ I'm using:
#include <iostream>
#define SIZE 32768
int main( int argc, char* argv[] ) {
std::getchar();
int ** p_p_dynamic2d = new int*[SIZE];
for(int i=0; i<SIZE; i++){
p_p_dynamic2d[i] = new int[SIZE];
}
std::getchar();
for(int i=0; i<SIZE; i++){
for(int j=0; j<SIZE; j++){
p_p_dynamic2d[i][j] = j+i;
}
}
std::getchar();
for(int i=0; i<SIZE; i++) {
delete [] p_p_dynamic2d[i];
}
delete [] p_p_dynamic2d;
std::getchar();
return 0;
}

I'm sure this is a duplicate, but I'll answer it anyway:
If you are viewing Task Manager size, it will give you the size of the process. If there is no "pressure" (your system has plenty of memory available, and no process is being starved), it makes no sense to reduce a process' virtual memory usage - it's not unusual for a process to grow, shrink, grow, shrink in a cyclical pattern as it allocates when it processes data and then releases the data used in one processing cycle, allocating memory for the next cycle, then freeing it again. If the OS were to "regain" those pages of memory, only to need to give them back to your process again, that would be a waste of processing power (assigning and unassigning pages to a particular process isn't entirely trivial, especially if you can't know for sure who those pages belonged to in the first place, since they need to be "cleaned" [filled with zero or some other constant to ensure the "new owner" can't use the memory for "fishing for old data", such as finding my password stored in the memory]).
Even if the pages are still remaining in the ownership of this process, but not being used, the actual RAM can be used by another process. So it's not a big deal if the pages haven't been released for some time.
Further, in debug mode, the C++ runtime will store "this memory has been deleted" in all memory that goes through delete. This is to help identify "use after free". So, if your application is running in debug mode, then don't expect any freed memory to be released EVER. It will get reused tho'. So if you run your code three times over, it won't grow to three times the size.

Related

Linux promises more memory than it can give [duplicate]

This question already has answers here:
Malloc on linux without overcommitting
(2 answers)
Closed 3 years ago.
Consider the following little program running on Linux:
#include <iostream>
#include <unistd.h>
#include <cstring>
int main() {
size_t array_size = 10ull * 1000 * 1000 * 1000;
size_t number_of_arrays = 20;
char* large_arrays[number_of_arrays];
// allocate more memory than the system can give
for (size_t i = 0; i < number_of_arrays; i++)
large_arrays[i] = new char[array_size];
// amount of free memory didn't actually change
sleep(10);
// write on that memory, so it is actually used
for (size_t i = 0; i < number_of_arrays; i++)
memset(large_arrays[i], 0, array_size);
sleep(10);
for (size_t i = 0; i < number_of_arrays; i++)
delete [] large_arrays[i];
return 0;
}
It allocates a lots of memory, more than the system can give. However, if I monitor the memory usage with top, it actually doesn't decrease. The program waits a bit, then it starts to write to the allocated memory and only then the amount of available free memory drops... until the system becomes unresponsive and the program is killed by oom-killer.
My questions are:
Why Linux promises to allocate more memory than it actually can provide? Shouldn't new[] throw a std::bad_alloc at some point?
How can I make sure, that Linux actually takes a piece of memory without having to write to it? I am writing some benchmarks where I would like to allocate lots of memory fast, but at the same time, I need to stay below certain memory limit.
Is it possible to monitor the amount of this "promised" memory?
The kernel version is 3.10.0-514.21.1.el7.x86_64. Maybe it behaves differently on newer versions?
Why Linux promises to allocate more memory than it actually can provide?
Because that is how your system had been configured. You can change the behaviour with. sysctl 'vm.overcommit_memory'.
Shouldn't new[] throw a std::bad_alloc at some point?
Not if the system over commits the memory.
How can I make sure, that Linux actually takes a piece of memory without having to write to it?
You can't as far as I know. Linux maps memory upon page fault when unmapped memory is accessed.
Is it possible to monitor the amount of this "promised" memory?
I think that "virtual" size of the process memory is what you're looking for.

Only able to allocate limited memory using new operator in CUDA

I wrote a cuda kernel like this
__global__ void mykernel(int size; int * h){
double *x[size];
for(int i = 0; i < size; i++){
x[i] = new double[2];
}
h[0] = 20;
}
void main(){
int size = 2.5 * 100000 // or 10,000
int *h = new int[size];
int *u;
size_t sizee = size * sizeof(int);
cudaMalloc(&u, sizee);
mykernel<<<size, 1>>>(size, u);
cudaMemcpy(&h, &u, sizee, cudaMemcpyDeviceToHost);
cout << h[0];
}
I have some other code in the kernel too but I have commented it out. The code above it also allocates some more memory.
Now when I run this with size = 2.5*10^5 I get h[0] value to be 0;
When I run this with size = 100*100 I get h[0] value to be 20;
So I am guessing that my kernels are crashing cause I am running out of memory. I am using a Tesla card C2075 which has ram 2GB ! I even tried this by shutting down the xserver. What I am working on is not even 100mb of data.
How can I allocate more memory to each block?
Now when I run this with size = 2.5*10^5 I get h[0] value to be 0;
When I run this with size = 100*100 I get h[0] value to be 20;
In your kernel launch, you are using this size variable also:
mykernel<<<size, 1>>>(size, u);
^^^^
On a cc2.0 device (Tesla C2075), this particular parameter in the 1D case is limited to 65535. So 2.5*10^5 exceeds 65535, but 100*100 does not. Therefore, your kernel may be running if you specify size of 100*100, but is probably not running if you specify size of 2.5*10^5.
As already suggested to you, proper cuda error checking should point this error out to you, and in general will probably result in you needing to ask far fewer questions on SO, as well as posting higher-quality questions on SO. Take advantage of the CUDA runtime's ability to let you know when things have gone wrong and when you are making a mistake. Then you won't be in a quandary, thinking you have a memory allocation problem when in fact you probably have a kernel launch configuration problem.
How can I allocate more memory to each block?
Although it is probably not your main issue (as indicated above), in-kernel new and malloc are limited to the size of the device heap. Once this has been exhausted, further calls to new or malloc will return a null pointer. If you use this null pointer anyway, your kernel code will begin to perform unspecified behavior, and will likely crash.
When using new and malloc, especially when you're having trouble, it's good practice to check for a null return value. This applies to both host (at least for malloc) and device code.
The size of the device heap is pretty small to begin with (8MB), but it can be modified.
Referring to the documentation:
The device memory heap has a fixed size that must be specified before any program using malloc() or free() is loaded into the context. A default heap of eight megabytes is allocated if any program uses malloc() without explicitly specifying the heap size.
The following API functions get and set the heap size:
•cudaDeviceGetLimit(size_t* size, cudaLimitMallocHeapSize)
•cudaDeviceSetLimit(cudaLimitMallocHeapSize, size_t size)
The heap size granted will be at least size bytes. cuCtxGetLimit()and cudaDeviceGetLimit() return the currently requested heap size.

Memory leak with dynamic array of mpfr variables in c++

I have a simple c++ program using the multiprecision library MPFR written to try and understand a memory problem in a bigger program:
int main() {
int prec=65536, size=1, newsize=1;
mpfr_t **mf;
while(true) {
size=newsize;
mf=new mpfr_t*[size];
for(int i=0;i<size;i++) {
mf[i]=new mpfr_t[size];
for(int j=0;j<size;j++) mpfr_init2(mf[i][j], prec);
}
cout << "Size of array: ";
cin >> newsize;
for(int i=0;i<size;i++) {
for(int j=0;j<size;j++) mpfr_clear(mf[i][j]);
delete [] mf[i];
}
delete [] mf;
}
}
The point here is to declare arrays of different sizes and monitor the memory usage with task manager (I'm using Windows). This works fine for sizes ~< 200 but if I declare something larger the memory doesn't seem to be freed up when I decrease the size again.
Here's an example run:
I start the program and choose size 50. Then I change sizes between 50, 100, 150 and 200 and see the memory usage go up and down as expected. I then choose size 250 and the memory usage goes up as expected but when I go back to 200 it doesn't decrease but increases to something like the sum of the memory values needed for size 200 and 250 respectively. A similar behaviour is seen with bigger sizes.
Any idea what's going on?
Process Explorer will give you a more realistic view of your process's memory usage (Virtual Size) than Task Manager will. A memory leak is when a program doesn't free memory is should and if this happens all the time it's memory will never stop increasing.
Windows won't necessarily free your program's memory back to the system itself - and so task manager etc won't tell you the whole truth.
To detect memory leaks in visual studio you can enable the _CRTDBG_MAP_ALLOC macro, as described on this MSDN page.
Also this question talks a bit about making it work with C++ new keyword.

C++ delete does not free all memory (Windows)

I need help understanding problems with my memory allocation and deallocation on Windows. I'm using VS11 compiler (VS2012 IDE) with latest update at the moment (Update 3 RC).
Problem is: I'm allocating dynamically some memory for a 2-dimensional array and immediately deallocating it. Still, before memory allocation, my process memory usage is 0,3 MB before allocation, on allocation 259,6 MB (expected since 32768 arrays of 64 bit ints (8bytes) are allocated), 4106,8 MB during allocation, but after deallocation memory does not drop to expected 0,3 MB, but is stuck at 12,7 MB. Since I'm deallocating all heap memory I've taken, I've expected memory to be back to 0,3 MB.
This is the code in C++ I'm using:
#include <iostream>
#define SIZE 32768
int main( int argc, char* argv[] ) {
std::getchar();
int ** p_p_dynamic2d = new int*[SIZE];
for(int i=0; i<SIZE; i++){
p_p_dynamic2d[i] = new int[SIZE];
}
std::getchar();
for(int i=0; i<SIZE; i++){
for(int j=0; j<SIZE; j++){
p_p_dynamic2d[i][j] = j+i;
}
}
std::getchar();
for(int i=0; i<SIZE; i++) {
delete [] p_p_dynamic2d[i];
}
delete [] p_p_dynamic2d;
std::getchar();
return 0;
}
I'm sure this is a duplicate, but I'll answer it anyway:
If you are viewing Task Manager size, it will give you the size of the process. If there is no "pressure" (your system has plenty of memory available, and no process is being starved), it makes no sense to reduce a process' virtual memory usage - it's not unusual for a process to grow, shrink, grow, shrink in a cyclical pattern as it allocates when it processes data and then releases the data used in one processing cycle, allocating memory for the next cycle, then freeing it again. If the OS were to "regain" those pages of memory, only to need to give them back to your process again, that would be a waste of processing power (assigning and unassigning pages to a particular process isn't entirely trivial, especially if you can't know for sure who those pages belonged to in the first place, since they need to be "cleaned" [filled with zero or some other constant to ensure the "new owner" can't use the memory for "fishing for old data", such as finding my password stored in the memory]).
Even if the pages are still remaining in the ownership of this process, but not being used, the actual RAM can be used by another process. So it's not a big deal if the pages haven't been released for some time.
Further, in debug mode, the C++ runtime will store "this memory has been deleted" in all memory that goes through delete. This is to help identify "use after free". So, if your application is running in debug mode, then don't expect any freed memory to be released EVER. It will get reused tho'. So if you run your code three times over, it won't grow to three times the size.

New is taking lots of extra memory

I'm making an application that is going to be using many dynamically created objects (raytracing). Instead of just using [new] over and over again, I thought I'd just make a simple memory system to speed things up. Its very simple at this point, as I don't need much.
My question is: when I run this test application, using my memory manager uses the correct amount of memory. But when I run the same loop using [new], it uses 2.5 to 3 times more memory. Is there just something I'm not seeing here, or does [new] incur a huge overhead?
I am using VS 2010 on Win7. Also I'm just using the Task Manager to view the process memory usage.
template<typename CLASS_TYPE>
class MemFact
{
public:
int m_obj_size; //size of the incoming object
int m_num_objs; //number of instances
char* m_mem; //memory block
MemFact(int num) : m_num_objs(num)
{
CLASS_TYPE t;
m_obj_size = sizeof(t);
m_mem = new char[m_obj_size * m_num_objs);
}
CLASS_TYPE* getInstance(int ID)
{
if( ID >= m_num_objs) return 0;
return (CLASS_TYPE*)(m_mem + (ID * m_obj_size));
}
void release() { delete m_mem; m_mem = 0; }
};
/*---------------------------------------------------*/
class test_class
{
float a,b,c,d,e,f,g,h,i,j; //10 floats
};
/*---------------------------------------------------*/
int main()
{
int num = 10 000 000; //10 M items
// at this point we are using 400K memory
MemFact<test_class> mem_fact(num);
// now we're using 382MB memory
for(int i = 0; i < num; i++)
test_class* new_test = mem_fact.getInstance(i);
mem_fact.release();
// back down to 400K
for(int i = 0; i < num; i++)
test_class* new_test = new test_class();
// now we are up to 972MB memory
}
There is a minimum size for a memory allocation, depending on the CRT you are using. Often that's 16 bytes. Your object is 12 bytes wide (assuming x86), so you're probably wasting at least 4 bytes per allocation right there. The memory manager also has it's own structures to keep track of what memory is free and what memory is not -- that's not free. Your memory manager is probably much simplier (e.g. frees all those objects in one go) which is inherently going to be more efficient than what new does for the general case.
Also keep in mind that if you're building in debug mode, the debugging allocator will pad both sides of the returned allocation with canaries in an attempt to detect undefined behavior. That'll probably put you over the 16 byte boundary and into the next one -- probably a 32 byte allocation, at least. That'll be disabled when you build in release mode.
Boy, I sure hope that nobody wants to allocate any non-PODs from your memory manager. Or objects of dynamic size. And doesn't mind instantiating it for every type. Or creating as many as they like all at once. Or having their lifetime be longer than the MemFact.
In fact, there is a valid pattern known as an Object Pool, which is similar to yours but doesn't suck. The simple answer is that operator new is required to be ultra flexible- it's objects must live forever until delete is called- and their destructor must be called too, and they must all have completely separate, independent lifetimes. It must be able to allocate variable-size objects, and of any type, at any time. Your MemFact meets none of these requirements. The Object Pool also has less requirements, and is significantly faster than regular new because of it, but it also doesn't completely fail on all the other fronts.
You're trying to compare an almost completely rotten apple with an orange.