Malloc allocates memory more than RAM

Malloc allocates memory more than RAM - c++

I just executed a program that mallocs 13 MB in a 12 MB machine (QEMU Emulated!) . Not just that, i even browsed through the memory and filled junk in it...
void
large_mem(void)
{
#define LONGMEM 13631488
long long *ptr = (long long *)malloc(LONGMEM);
long long i;
if(!ptr) {
printf("%s(): array allocation of size %lld failed.\n",__func__,LONGMEM);
ASSERT(0);
}
for(i = 0 ; i < LONGMEM ; i++ ) {
*(ptr+i)=i;
}
free(ptr);
}
How is it possible ? I was expecting a segmentation fault.

It's called virtual memory which is allocated for your program. It's not real memory which you call RAM.
There is a max limit for virtual memory as well, but it's higher than RAM. It's implemented (and defined) by your operating system.

This is called as Lazy Allocation.
Most OS like Linux have an Lazy Allocation memory model wherein the returned memory address is a virtual address and the actual allocation only happens at access-time. The OS assumes that it will be able to provide this allocation at access-Time.
The memory allocated by malloc is not backed by real memory until the program actually touches it.
While, since calloc initializes the memory to 0 you can be assured that the OS has already backed the allocation with actual RAM (or swap).
Try using callocand most probably it will return you out of memory unless your swap file/partition is big enough to satisfy the request.

Sounds like your operating system is swapping pages:
Paging is an important part of virtual memory implementation in most
contemporary general-purpose operating systems, allowing them to use
disk storage for data that does not fit into physical random-access
memory (RAM).
In other words, the operating system is using some of your hard disk space to satisfy your 13 MB allocation request (at great expense of speed, since the hard disk is much, much slower than RAM).

Unless the virtualized OS has swap available, what you're encountering is called overcommit, and it basically exists because the easy way to manage resources in a system with virtual memory and demand/copy-on-write pages is not to manage them. Overcommit is a lot like a bank loaning out more money than it actually has -- it seems to work for a while, then things come crashing down. The first thing you should do when setting up a Linux system is fix this with the command:
echo "2" > /proc/sys/vm/overcommit_memory
That just affects the currently running kernel; you can make it permanent by adding a line to /etc/sysctl.conf:
vm.overcommit_memory=2

Related

Can I create an array exceeding RAM, if I have enough swap memory?

Let's say I have 8 Gigabytes of RAM and 16 Gigabytes of swap memory. Can I allocate a 20 Gigabyte array there in C? If yes, how is it possible? What would that memory layout look like?

[linux] Can I create an array exceeding RAM, if I have enough swap memory?
Yes, you can. Note that accessing swap is veerry slooww.
how is it possible
Allocate dynamic memory. The operating system handles the rest.
How would that memory layout look like?
On an amd64 system, you can have 256 TiB of address space. You can easily fit a contiguous block of 8 GiB in that space. The operating system divides the virtual memory into pages and copies the pages between physical memory and swap space as needed.

Modern operating systems use virtual memory. In Linux and most other OSes rach process has it's own address space according to the abilities of the architecture. You can check the size of the virtual address space in /proc/cpuinfo. For example you may see:
address sizes : 43 bits physical, 48 bits virtual
This means that virtual addresses use 48 bit. Half of that is reserved for the kernel so you only can use 47 bit, or 128TiB. Any memory you allocate will be placed somewhere in those 128 TiB of address space as if you actually had that much memory.
Linux uses demand page loading and per default over commits memory. When you say
char *mem = (char*)malloc(1'000'000'000'000);
what happens is that Linux picks a suitable address and just records that you have allocated 1'000'000'000'000 (rounded up to the nearest page) of memory starting at that point. (It does some sanity check that the amount isn't totally bonkers depending on the amount of physical memory that is free, the amount of swap that is free and the overcommit setting. Per default you can allocate a lot more than you have memory and swap.)
Note that at this point no physical memory and no swap space is connected to your allocated block at all. This changes when you first write to the memory:
mem[4096] = 0;
At this point the program will page fault. Linux checks the address is actually something your program is allowed to write to, finds a physical page and map it to &mem[4096]. Then it lets the program retry to write there and everything continues.
If Linux can't find a physical page it will try to swap something out to make a physical page available for your programm. If that also fails your program will receive a SIGSEGV and likely die.
As a result you can allocate basically unlimited amounts of memory as long as you never write to more than the physical memory and swap and support. On the other hand if you initialize the memory (explicitly or implicitly using calloc()) the system will quickly notice if you try to use more than available.

You can, but not with a simple malloc. It's platform-dependent.
It requires an OS call to allocate swapable memory (it's VirtualAlloc on Windows, for example, on Linux it should be mmap and related functions).
Once it's done, the allocated memory is divided into pages, contiguous blocks of fixed size. You can lock a page, therefore it will be loaded in RAM and you can read and modify it freely. For old dinosaurs like me, it's exactly how EMS memory worked under DOS... You address your swappable memory with a kind of segment:offset method: first, you divide your linear address by the page size to find which page is needed, then you use the remainder to get the offset within this page.
Once unlocked, the page remains in memory until the OS needs memory: then, an unlocked page will be flushed to disk, in swap, and discarded in RAM... Until you lock (and load...) it again, but this operation may requires to free RAM, therefore another process may have its unlocked pages swapped BEFORE your own page is loaded again. And this is damnly SLOOOOOOW... Even on a SSD!
So, it's not always a good thing to use swap. A better way is to use memory mapped files - perfect for reading very big files mostly sequentially, with few random accesses - if it can suits your needs.

How can I make sure the std::vector allocated memory give back to operating system after deallocating?

The code below is calling foo and use while(1) to watch the memory usage. As I know, after 'finished' printed, var d is deallocated and the STL container will free the data space(heap) by himself.
#include <vector>
#include <string>
#include <iostream>
void foo() {
std::vector<std::string> d(100000000);
for(int i = 0; i < 100000000; ++i) d[i] = "1,1,3";
d.resize(0);
d.shrink_to_fit();
}
int main(int argc, char *argv[])
{
foo();
std::cout << "finished" << std::endl;
while(1) {;}
return 0;
}
But what I observed(using htop): memory is not freed back to the operating system. This is just a bench and the real code related to MESOS which has memory limitation for each process.
I have tried several versions of compiler such as g++-4.7.2 g++-4.8.1, clang++ on linux server with glibc 2.15. More, I also use tcmalloc instead of default malloc, but it still do not work(in MAC machine the problem will not happen).
What's the problem? How can I make sure the memory give back to os?
Thank you.

How can I make sure the memory give back to os?
You can terminate your process.
What's the problem?
There probably isn't one. It's normal for programs not to return memory (though Linux does return memory early for some particularly large allocations). They normally use sbrk or equivalent to grow the virtual address space available to them, but it's not normally worth the effort of trying to return deallocated memory. This may be counter-intuitive, but it's also proven workable for millions of programs over many decades, so you shouldn't bother yourself with it unless you have a specific tangible problem. It shouldn't cause problems for you as the deallocated memory will be reused when the application performs further allocations, so the "MESOS memory limitation for each process" you mention still affects the "high watermark" of maximum instantaneous memory usage the same way.
Note that OSes with virtual memory support may swap long unused deallocated pages to disk so the backing RAM can be reused by the kernel or other apps.
It's also possible to take manual control of this using e.g. memory mapped files, but writing such allocators and using the from Standard containers is a non-trivial undertaking... lots of other SO questions on how to approach that problem.

Allocating memory from the OS has two downsides:
High overhead. A system call involves a switch into protected mode which takes much longer than a simple function call, and then the memory management for the OS itself is probably quite complex.
High granularity. The OS probably has a minimum size allocation like 4K. That's a lot of overhead for a 6 byte string.
For these reasons the C++ memory allocator will only ask the OS for large blocks, then parcel out pieces of it when asked via new or malloc.
When those pieces of memory are released, they're put back into a pool to be handed out again on the next request. Now it's quite possible that all of the pieces of a larger block end up being freed, but how often does that happen in real life? Chances are that there will be at least one allocation per block that sticks around for a long time, preventing the block from being returned to the OS. And if it is returned, what do you think are the chances that the program will turn around and request it back again a short time later? As a practical matter it usually doesn't pay to return blocks to the OS. Your test program is a highly artificial case that isn't worth optimizing for.

In most modern systems the operating system manages memory in pages. Application memory is managed in pools (heaps) by library functions. When your application allocates memory, the library functions attempt to find an available block of the size you requested. If the memory is not in the pool, the library calls the system to add more pages to the process to incorporate into the pool(heap). When you free memory it goes back into the pool. The allocated pages in the pool do not return to the operating system.

Allocating more memory than there exists using malloc

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?

It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).

from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.

No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.

This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n

Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);

1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.

How and why an allocation memory can fail?

This was an question I asked myself when I was a student, but failing to get a satisfying answer, I got it little by little out my mind... till today.
I known I can deal with an allocation memory error either by checking if the returned pointer is NULL or by handling the bad_alloc exception.
Ok, but I wonder: How and why the call of new can fail? Up to my knowledge, an allocation memory can fail if there is not enough space in the free store. But does this situation really occur nowadays, with several GB of RAM (at least on a regular computer; I am not talking about embedded systems)? Can we have other situations where an allocation memory failure may occur?

Although you've gotten a number of answers about why/how memory could fail, most of them are sort of ignoring reality.
In reality, on real systems, most of these arguments don't describe how things really work. Although they're right from the viewpoint that these are reasons an attempted memory allocation could fail, they're mostly wrong from the viewpoint of describing how things are typically going to work in reality.
Just for example, in Linux, if you try to allocate more memory than the system has available, your allocation will not fail (i.e., you won't get a null pointer or a strd::bad_alloc exception). Instead, the system will "over commit", so you get what appears to be a valid pointer -- but when/if you attempt to use all that memory, you'll get an exception, and/or the OOM Killer will run, trying to free memory by killing processes that use a lot of memory. Unfortunately, this may about as easily kill the program making the request as other programs (in fact, many of the examples given that attempt to cause allocation failure by just repeatedly allocating big chunks of memory should probably be among the first to be killed).
Windows works a little closer to how the C and C++ standards envision things (but only a little). Windows is typically configured to expand the swap file if necessary to meet a memory allocation request. This means that what as you allocate more memory, the system will go semi-crazy with swapping memory around, creating bigger and bigger swap files to meet your request.
That will eventually fail, but on a system with lots of drive space, it might run for hours (most of it madly shuffling data around on the disk) before that happens. At least on a typical client machine where the user is actually...well, using the computer, he'll notice that everything has dragged to a grinding halt, and do something to stop it well before the allocation fails.
So, to get a memory allocation that truly fails, you're typically looking for something other than a typical desktop machine. A few examples include a server that runs unattended for weeks at a time, and is so lightly loaded that nobody notices that it's thrashing the disk for, say, 12 hours straight, or a machine running MS-DOS or some RTOS that doesn't supply virtual memory.
Bottom line: you're basically right, and they're basically wrong. While it's certainly true that if you allocate more memory than the machine supports, that something's got to give, it's generally not true that the failure will necessarily happen in the way prescribed by the C++ standard -- and, in fact, for typical desktop machines that's more the exception (pardon the pun) than the rule.

Apart from the obvious "out of memory", memory fragmentation can also cause this. Imagine a program that does the following:
until main memory is almost full:
allocate 1020 bytes
allocate 4 bytes
free all the 1020 byte blocks
If the memory manager puts all these sequentially in memory in the order they are allocated, we now have plenty of free memory, but any allocation larger than 1020 bytes will not be able to find a contiguous space to put them, and fail.

Usually on modern machines it will fail due to scarcity of virtual address space; if you have a 32 bit process that tries to allocate more than 2/3 GB of memory1, even if there would be physical RAM (or paging file) to satisfy the allocation, simply there won't be space in the virtual address space to map such newly allocated memory.
Another (similar) situation happens when the virtual address space is heavily fragmented, and thus the allocation fails because there's not enough contiguous addresses for it.
Also, running out of memory can happen, and in fact I got in such a situation last week; but several operating systems (notably Linux) in this case don't return NULL: Linux will happily give you a pointer to an area of memory that isn't already committed, and actually allocate it when the program tries to write in it; if at that moment there's not enough memory, the kernel will try to kill some memory-hogging processes to free memory (an exception to this behavior seems to be when you try to allocate more than the whole capacity of the RAM and of the swap partition - in such a case you get a NULL upfront).
Another cause of getting NULL from a malloc may be due to limits enforced by the OS over the process; for example, trying to run this code
#include <cstdlib>
#include <iostream>
#include <limits>
void mallocbsearch(std::size_t lower, std::size_t upper)
{
std::cout<<"["<<lower<<", "<<upper<<"]\n";
if(upper-lower<=1)
{
std::cout<<"Found! "<<lower<<"\n";
return;
}
std::size_t mid=lower+(upper-lower)/2;
void *ptr=std::malloc(mid);
if(ptr)
{
free(ptr);
mallocbsearch(mid, upper);
}
else
mallocbsearch(lower, mid);
}
int main()
{
mallocbsearch(0, std::numeric_limits<std::size_t>::max());
return 0;
}
on Ideone you find that the maximum allocation size is about 530 MB, which is probably a limit enforced by setrlimit (similar mechanisms exist on Windows).
it varies between OSes and can often be configured; the total virtual address space of a 32 bit process is 4 GB, but on all the current mainstream OSes a big chunk of it (the upper 2 GB by for 32 bit Windows with default settings) is reserved for kernel data.

The amount of memory available to the given process is finite. If the process exhausts its memory, and tries to allocate more, the allocation would fail.
There are other reasons why an allocation could fail. For example, the heap could get fragmented and not have a single free block large enough to satisfy the allocation request.

what could be reason for virtual bytes to grow 2x private bytes?

An application's virtual bytes grow 2-times the private bytes.
does this indicate memory leak? bad application design?
OS is 32Bit
any thoughts are welcome.
application is stream database.

Fragmentation.
If you allocate the following chunks of memory:
16KB
8KB
16KB
and you then free the chunk of 8KB, your application will have 32 KB of private bytes, but 40 KB bytes of virtual memory, which is actually the highest virtual memory address that has ever been in use by your process (ignoring the other memory parts for sake of simplicity).
Consider (if possible) using another memory manager. Some alternatives are:
The Windows Low-fragementation heap (see http://msdn.microsoft.com/en-us/library/aa366750%28VS.85%29.aspx for more info)
The Doug-Lea open source memory manager
Commercial alternatives like Hoard
A fourth alternative is to write your own memory manager. It's not that easy, but if done right, it can have quite some benefits. Especially for certain niche or special applications, writing your own memory manager can be useful.

An application's virtual bytes grow 2-times the private bytes.
If application allocates only heap, then to me it would be the sign that application allocates lots of memory but never actually touches it. For example:
void *p = malloc( 16u<<20 );
would eat up 16MB of virtual memory. But as long as application doesn't perform any actions with the memory block, OS wouldn't even attempt to map the virtual memory to the RAM. Simplest way to force the actual allocation of private memory is to memset() it:
void *p = malloc( 16u<<20 );
memset( p, 0, 16u<<20 );
does this indicate memory leak? bad application design?
Or both. Or neither.
The longer variant of the response: unknown, depends on what memory application allocates, what other resources application uses, OS, h/w platform, etc.
If unsure, use a memory leak analysis tools to investigate, e.g. valgrind. Read up SO for more information on memory leak analysis in C++.

Memory allocation has overhead to store management information about what was allocated. If you're allocating very small buffers the extra information can be a significant percentage of the total. That might be what you're seeing.

One possibility is if you set a large stack reserve size for your threads with linker option /STACK:reserve_bytes and then you start a lot of threads.
For example, if you have an ATL service, it automatically starts 4*numberOfCores apartment message dispatching threads by default. Compile and link such a service with /STACK:12000000 (12 megabytes), then run it on a 16-core server and it will start 64 threads, each with a 12MB stack, immediately consuming 768MB of virtual address space, although the actual committed memory may be much lower.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js