Linux promises more memory than it can give [duplicate] - c++

This question already has answers here:
Malloc on linux without overcommitting
(2 answers)
Closed 3 years ago.
Consider the following little program running on Linux:
#include <iostream>
#include <unistd.h>
#include <cstring>
int main() {
size_t array_size = 10ull * 1000 * 1000 * 1000;
size_t number_of_arrays = 20;
char* large_arrays[number_of_arrays];
// allocate more memory than the system can give
for (size_t i = 0; i < number_of_arrays; i++)
large_arrays[i] = new char[array_size];
// amount of free memory didn't actually change
sleep(10);
// write on that memory, so it is actually used
for (size_t i = 0; i < number_of_arrays; i++)
memset(large_arrays[i], 0, array_size);
sleep(10);
for (size_t i = 0; i < number_of_arrays; i++)
delete [] large_arrays[i];
return 0;
}
It allocates a lots of memory, more than the system can give. However, if I monitor the memory usage with top, it actually doesn't decrease. The program waits a bit, then it starts to write to the allocated memory and only then the amount of available free memory drops... until the system becomes unresponsive and the program is killed by oom-killer.
My questions are:
Why Linux promises to allocate more memory than it actually can provide? Shouldn't new[] throw a std::bad_alloc at some point?
How can I make sure, that Linux actually takes a piece of memory without having to write to it? I am writing some benchmarks where I would like to allocate lots of memory fast, but at the same time, I need to stay below certain memory limit.
Is it possible to monitor the amount of this "promised" memory?
The kernel version is 3.10.0-514.21.1.el7.x86_64. Maybe it behaves differently on newer versions?

Why Linux promises to allocate more memory than it actually can provide?
Because that is how your system had been configured. You can change the behaviour with. sysctl 'vm.overcommit_memory'.
Shouldn't new[] throw a std::bad_alloc at some point?
Not if the system over commits the memory.
How can I make sure, that Linux actually takes a piece of memory without having to write to it?
You can't as far as I know. Linux maps memory upon page fault when unmapped memory is accessed.
Is it possible to monitor the amount of this "promised" memory?
I think that "virtual" size of the process memory is what you're looking for.

Related

Only able to allocate limited memory using new operator in CUDA

I wrote a cuda kernel like this
__global__ void mykernel(int size; int * h){
double *x[size];
for(int i = 0; i < size; i++){
x[i] = new double[2];
}
h[0] = 20;
}
void main(){
int size = 2.5 * 100000 // or 10,000
int *h = new int[size];
int *u;
size_t sizee = size * sizeof(int);
cudaMalloc(&u, sizee);
mykernel<<<size, 1>>>(size, u);
cudaMemcpy(&h, &u, sizee, cudaMemcpyDeviceToHost);
cout << h[0];
}
I have some other code in the kernel too but I have commented it out. The code above it also allocates some more memory.
Now when I run this with size = 2.5*10^5 I get h[0] value to be 0;
When I run this with size = 100*100 I get h[0] value to be 20;
So I am guessing that my kernels are crashing cause I am running out of memory. I am using a Tesla card C2075 which has ram 2GB ! I even tried this by shutting down the xserver. What I am working on is not even 100mb of data.
How can I allocate more memory to each block?
Now when I run this with size = 2.5*10^5 I get h[0] value to be 0;
When I run this with size = 100*100 I get h[0] value to be 20;
In your kernel launch, you are using this size variable also:
mykernel<<<size, 1>>>(size, u);
^^^^
On a cc2.0 device (Tesla C2075), this particular parameter in the 1D case is limited to 65535. So 2.5*10^5 exceeds 65535, but 100*100 does not. Therefore, your kernel may be running if you specify size of 100*100, but is probably not running if you specify size of 2.5*10^5.
As already suggested to you, proper cuda error checking should point this error out to you, and in general will probably result in you needing to ask far fewer questions on SO, as well as posting higher-quality questions on SO. Take advantage of the CUDA runtime's ability to let you know when things have gone wrong and when you are making a mistake. Then you won't be in a quandary, thinking you have a memory allocation problem when in fact you probably have a kernel launch configuration problem.
How can I allocate more memory to each block?
Although it is probably not your main issue (as indicated above), in-kernel new and malloc are limited to the size of the device heap. Once this has been exhausted, further calls to new or malloc will return a null pointer. If you use this null pointer anyway, your kernel code will begin to perform unspecified behavior, and will likely crash.
When using new and malloc, especially when you're having trouble, it's good practice to check for a null return value. This applies to both host (at least for malloc) and device code.
The size of the device heap is pretty small to begin with (8MB), but it can be modified.
Referring to the documentation:
The device memory heap has a fixed size that must be specified before any program using malloc() or free() is loaded into the context. A default heap of eight megabytes is allocated if any program uses malloc() without explicitly specifying the heap size.
The following API functions get and set the heap size:
•cudaDeviceGetLimit(size_t* size, cudaLimitMallocHeapSize)
•cudaDeviceSetLimit(cudaLimitMallocHeapSize, size_t size)
The heap size granted will be at least size bytes. cuCtxGetLimit()and cudaDeviceGetLimit() return the currently requested heap size.

Huge std::vector<std::vector> does not release all memory on destruction [duplicate]

This question already has answers here:
Linux Allocator Does Not Release Small Chunks of Memory
(4 answers)
Closed 8 years ago.
When using a very large vector of vectors we've found that part of the memory is not released.
#include <iostream>
#include <vector>
#include <unistd.h>
void foo()
{
std::vector<std::vector<unsigned int> > voxelToPixel;
unsigned int numElem = 1<<27;
voxelToPixel.resize( numElem );
for (unsigned int idx=0; idx < numElem; idx++)
voxelToPixel.at(idx).push_back(idx);
}
int main()
{
foo();
std::cout << "End" << std::endl;
sleep(30);
return 0;
}
That leaves around 4GB of memory hanging until the process ends.
If we change the for line to
for (unsigned int idx=0; idx < numElem; idx++)
voxelToPixel.at(0).push_back(idx);
the memory is released.
Using gcc-4.8 on a linux machine. We've used htop to track the memory usage on a computer with 100 GB of RAM. You will need around 8 GB of RAM to run the code. Can you reproduce the problem? Any ideas on why that is happening?
EDIT:
We've seen that that does not happen in a Mac (with either gcc or clang). Also, in linux, the memory is freed if we call foo two times (but happens again the third time).
Small allocations (up to 128kb by default, I think) are managed by an in-process heap, and are not returned to the OS when they're deallocated; they're returned to the heap for reuse within the process. Larger allocations come directly from the OS (by calling mmap), and are returned to the OS when deallocated.
In your first example, each vector only needs to allocate enough space for a single int. You have a hundred million small allocations, none of which will be returned to the OS.
In the second example, as the vector grows, it will make many allocations of various sizes. Some are smaller than the mmap threshold, these will remain in the process memory; but, since you only do this to one vector, that won't be a huge amount. If you were to use resize or reserve to allocate all the memory for each vector before populating it, then you should find that all the memory is returned to the OS.

Delete class pointer does not free memory [duplicate]

I need help understanding problems with my memory allocation and deallocation on Windows. I'm using VS11 compiler (VS2012 IDE) with latest update at the moment (Update 3 RC).
Problem is: I'm allocating dynamically some memory for a 2-dimensional array and immediately deallocating it. Still, before memory allocation, my process memory usage is 0,3 MB before allocation, on allocation 259,6 MB (expected since 32768 arrays of 64 bit ints (8bytes) are allocated), 4106,8 MB during allocation, but after deallocation memory does not drop to expected 0,3 MB, but is stuck at 12,7 MB. Since I'm deallocating all heap memory I've taken, I've expected memory to be back to 0,3 MB.
This is the code in C++ I'm using:
#include <iostream>
#define SIZE 32768
int main( int argc, char* argv[] ) {
std::getchar();
int ** p_p_dynamic2d = new int*[SIZE];
for(int i=0; i<SIZE; i++){
p_p_dynamic2d[i] = new int[SIZE];
}
std::getchar();
for(int i=0; i<SIZE; i++){
for(int j=0; j<SIZE; j++){
p_p_dynamic2d[i][j] = j+i;
}
}
std::getchar();
for(int i=0; i<SIZE; i++) {
delete [] p_p_dynamic2d[i];
}
delete [] p_p_dynamic2d;
std::getchar();
return 0;
}
I'm sure this is a duplicate, but I'll answer it anyway:
If you are viewing Task Manager size, it will give you the size of the process. If there is no "pressure" (your system has plenty of memory available, and no process is being starved), it makes no sense to reduce a process' virtual memory usage - it's not unusual for a process to grow, shrink, grow, shrink in a cyclical pattern as it allocates when it processes data and then releases the data used in one processing cycle, allocating memory for the next cycle, then freeing it again. If the OS were to "regain" those pages of memory, only to need to give them back to your process again, that would be a waste of processing power (assigning and unassigning pages to a particular process isn't entirely trivial, especially if you can't know for sure who those pages belonged to in the first place, since they need to be "cleaned" [filled with zero or some other constant to ensure the "new owner" can't use the memory for "fishing for old data", such as finding my password stored in the memory]).
Even if the pages are still remaining in the ownership of this process, but not being used, the actual RAM can be used by another process. So it's not a big deal if the pages haven't been released for some time.
Further, in debug mode, the C++ runtime will store "this memory has been deleted" in all memory that goes through delete. This is to help identify "use after free". So, if your application is running in debug mode, then don't expect any freed memory to be released EVER. It will get reused tho'. So if you run your code three times over, it won't grow to three times the size.

C++ delete does not free all memory (Windows)

I need help understanding problems with my memory allocation and deallocation on Windows. I'm using VS11 compiler (VS2012 IDE) with latest update at the moment (Update 3 RC).
Problem is: I'm allocating dynamically some memory for a 2-dimensional array and immediately deallocating it. Still, before memory allocation, my process memory usage is 0,3 MB before allocation, on allocation 259,6 MB (expected since 32768 arrays of 64 bit ints (8bytes) are allocated), 4106,8 MB during allocation, but after deallocation memory does not drop to expected 0,3 MB, but is stuck at 12,7 MB. Since I'm deallocating all heap memory I've taken, I've expected memory to be back to 0,3 MB.
This is the code in C++ I'm using:
#include <iostream>
#define SIZE 32768
int main( int argc, char* argv[] ) {
std::getchar();
int ** p_p_dynamic2d = new int*[SIZE];
for(int i=0; i<SIZE; i++){
p_p_dynamic2d[i] = new int[SIZE];
}
std::getchar();
for(int i=0; i<SIZE; i++){
for(int j=0; j<SIZE; j++){
p_p_dynamic2d[i][j] = j+i;
}
}
std::getchar();
for(int i=0; i<SIZE; i++) {
delete [] p_p_dynamic2d[i];
}
delete [] p_p_dynamic2d;
std::getchar();
return 0;
}
I'm sure this is a duplicate, but I'll answer it anyway:
If you are viewing Task Manager size, it will give you the size of the process. If there is no "pressure" (your system has plenty of memory available, and no process is being starved), it makes no sense to reduce a process' virtual memory usage - it's not unusual for a process to grow, shrink, grow, shrink in a cyclical pattern as it allocates when it processes data and then releases the data used in one processing cycle, allocating memory for the next cycle, then freeing it again. If the OS were to "regain" those pages of memory, only to need to give them back to your process again, that would be a waste of processing power (assigning and unassigning pages to a particular process isn't entirely trivial, especially if you can't know for sure who those pages belonged to in the first place, since they need to be "cleaned" [filled with zero or some other constant to ensure the "new owner" can't use the memory for "fishing for old data", such as finding my password stored in the memory]).
Even if the pages are still remaining in the ownership of this process, but not being used, the actual RAM can be used by another process. So it's not a big deal if the pages haven't been released for some time.
Further, in debug mode, the C++ runtime will store "this memory has been deleted" in all memory that goes through delete. This is to help identify "use after free". So, if your application is running in debug mode, then don't expect any freed memory to be released EVER. It will get reused tho'. So if you run your code three times over, it won't grow to three times the size.

Memory usage isn't decreasing when using free?

Somehow this call to free() is not working. I ran this application on Windows and followed the memory using in Task Manager, but saw no reduction in memory usage after the call to free().
int main(int argc, char *argv[])
{
int i=0;
int *ptr;
ptr = (int*) malloc(sizeof(int) * 1000);
for (i=0; i < 1000; i++)
{
ptr[i] = 0;
}
free(ptr); // After this call, the program memory usage doesn't decrease
system("PAUSE");
return 0;
}
Typical C implementations do not return free:d memory to the operating system. It is available for use by the same program, but not to others.
You can not assume that just after doing the free the memory will be returned back to OS. Generally the CRT implementation have some optimization because of which they may not return this memory immediately. This allows the CRT to allocate the subsequent memory allocation requests in a faster way.
Note that the Task manager will show the memory "borrowed" by libc from the system. But not all mallocs will go through libc to the operating system and similarly not all free will free the system memory.
Usually, libc will allocate memory in larger chunks to supply for several malloc calls.