Allocating more memory than there exists using malloc - c++

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
char input [1];
vector<char *> activate;
while(input[0] != 'q')
gets (input);
if(input[0] == 'u')
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
else if(input[0] == 'a')
for(int x = 0; x < activate.size(); x++)
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
m[x] = 'a';
return 0;
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?

It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).

from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.

No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.

This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)

Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);

1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.


Fast synchronized access to shared array with changing base address (in C11)

I am currently designing a user space scheduler in C11 for a custom co-processor under Linux (user space, because the co-processor does not run its own OS, but is controlled by software running on the host CPU). It keeps track of all the tasks' states with an array. Task states are regular integers in this case. The array is dynamically allocated and each time a new task is submitted whose state does not fit into the array anymore, the array is reallocated to twice its current size. The scheduler uses multiple threads and thus needs to synchronize its data structures.
Now, the problem is that I very often need to read entries in that array, since I need to know the states of tasks for scheduling decisions and resource management. If the base address was guaranteed to always be the same after each reallocation, I would simply use C11 atomics for accessing it. Unfortunately, realloc obviously cannot give such a guarantee. So my current approach is wrapping each access (reads AND writes) with one big lock in the form of a pthread mutex. Obviously, this is really slow, since there is locking overhead for each read, and the read is really small, since it only consists of a single integer.
To clarify the problem, I give some code here showing the relevant passages:
// pthread_mutex_t mut;
// size_t len_arr;
// int *array, idx, x;
if (idx >= len_arr) {
len_arr *= 2;
array = realloc(array, len_arr*sizeof(int));
if (array == NULL)
array[idx] = x;
// pthread_mutex_t mut;
// int *array, idx;
int x = array[idx];
I have already used C11 atomics for efficient synchronization elsewhere in the implementation and would love to use them to solve this problem as well, but I could not find an efficient way to do so. In a perfect world, there would be an atomic accessor for arrays which performs address calculation and memory read/write in a single atomic operation. Unfortunately, I could not find such an operation. But maybe there is a similarly fast or even faster way of achieving synchronization in this situation?
I forgot to specify that I cannot reuse slots in the array when tasks terminate. Since I guarantee access to the state of every task ever submitted since the scheduler was started, I need to store the final state of each task until the application terminates. Thus, static allocation is not really an option.
Do you need to be so economical with virtual address space? Can't you just set a very big upper limit and allocate enough address space for it (maybe even a static array, or dynamic if you want the upper limit to be set at startup from command-line options).
Linux does lazy memory allocation so virtual pages that you never touch aren't actually using any physical memory. See Why is iterating though `std::vector` faster than iterating though `std::array`? that show by example that reading or writing an anonymous page for the first time causes a page fault. If it was a read access, it gets the kernel to CoW (copy-on-write) map it to a shared physical zero page. Only an initial write, or a write to a CoW page, triggers actual allocation of a physical page.
Leaving virtual pages completely untouched avoids even the overhead of wiring them into the hardware page tables.
If you're targeting a 64-bit ISA like x86-64, you have boatloads of virtual address space. Using up more virtual address space (as long as you aren't wasting physical pages) is basically fine.
Practical example of allocating more address virtual space than you can use:
If you allocate more memory than you could ever practically use (touching it all would definitely segfault or invoke the kernel's OOM killer), that will be as large or larger than you could ever grow via realloc.
To allocate this much, you may need to globally set /proc/sys/vm/overcommit_memory to 1 (no checking) instead of the default 0 (heuristic which makes extremely large allocations fail). Or use mmap(MAP_NORESERVE) to allocate it, making that one mapping just best-effort growth on page-faults.
The documentation says you might get a SIGSEGV on touching memory allocated with MAP_NORESERVE, which is different than invoking the OOM killer. But I think once you've already successfully touched memory, it is yours and won't get discarded. I think it's also not going to spuriously fail unless you're actually running out of RAM + swap space. IDK how you plan to detect that in your current design (which sounds pretty sketchy if you have no way to ever deallocate).
Test program:
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
int main(void) {
size_t sz = 1ULL << 46; // 2**46 = 64 TiB = max power of 2 for x86-64 with 48-bit virtual addresses
// in practice 1ULL << 40 (1TiB) should be more than enough.
// the smaller you pick, the less impact if multiple things use this trick in the same program
//int *p = aligned_alloc(64, sz); // doesn't use NORESERVE so it will be limited by overcommit settings
madvise(p, sz, MADV_HUGEPAGE); // for good measure to reduce page-faults and TLB misses, since you're using large contiguous chunks of this array
p[1000000000] = 1234; // or sz/sizeof(int) - 1 will also work; this is only touching 1 page somewhere in the array.
printf("%p\n", p);
$ gcc -Og -g -Wall alloc.c
$ strace ./a.out
... process startup
mmap(NULL, 70368744177664, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x15c71ef7c000
madvise(0x15c71ef7c000, 70368744177664, MADV_HUGEPAGE) = 0
... stdio stuff
write(1, "0x15c71ef7c000\n", 15) = 15
exit_group(0) = ?
+++ exited with 0 +++
My desktop has 16GiB of RAM (a lot of it in use by Chromium and some big files in /tmp) + 2GiB of swap. Yet this program allocated 64 TiB of virtual address space and touched 1 int of it nearly instantly. Not measurably slower than if it had only allocated 1MiB. (And future performance from actually using that memory should also be unaffected.)
The largest power-of-2 you can expect to work on current x86-64 hardware is 1ULL << 46. The total lower canonical range of the 48-bit virtual address space is 47 bits (user-space virtual address space on Linux), and some of that is already allocated for stack/code/data. Allocating a contiguous 64 TiB chunk of that still leaves plenty for other allocations.
(If you do actually have that much RAM + swap, you're probably waiting for a new CPU with 5-level page tables so you can use even more virtual address space.)
Speaking of page tables, the larger the array the more chance of putting some other future allocations very very far from existing blocks. This can have a minor cost in TLB-miss (page walk) time, if your actual in-use pages end up more scattered around your address space in more different sub-trees of the multi-level page tables. That's more page-table memory to keep in cache (including cached within the page-walk hardware).
The allocation size doesn't have to be a power of 2 but it might as well be. There's also no reason to make it that big. 1ULL << 40 (1TiB) should be fine on most systems. IDK if having more than half the available address space for a process allocated could slow future allocations; bookkeeping is I think based on extents (ptr + length) not bitmaps.
Keep in mind that if everyone starts doing this for random arrays in libraries, that could use up a lot of address space. This is great for the main array in a program that spends a lot of time using it. Keep it as small as you can while still being big enough to always be more than you need. (Optionally make it a config parameter if you want to avoid a "640kiB is enough for everyone" situation). Using up virtual address space is very low-cost, but it's probably better to use less.
Think of this as reserving space for future growth but not actually using it until you touch it. Even though by some ways of looking at it, the memory already is "allocated". But in Linux it really isn't. Linux defaults to allowing "overcommit": processes can have more total anonymous memory mapped than the system has physical RAM + swap. If too many processes try to use too much by actually touching all that allocated memory, the OOM killer has to kill something (because the "allocate" system calls like mmap have already returned success). See
(With MAP_NORESERVE, it's only reserving address space which is shared between threads, but not reserving any physical pages until you touch them.)
You probably want your array to be page-aligned: #include <stdalign.h> so you can use something like
alignas(4096) struct entry process_array[MAX_LEN];
Or for non-static, allocate it with C11 aligned_alloc().
Give back early parts of the array when you're sure all threads are done with it
Page alignment makes it easy do the calculations to "give back" a memory page (4kiB on x86) if your array's logical size shrinks enough. madvise(addr, 4096*n, MADV_FREE); (Linux 4.5 and later). This is kind of like mmap(MAP_FIXED) to replace some pages with new untouched anonymous pages (that will read as zeroes), except it doesn't split up the logical mapping extents and create more bookkeeping for the kernel.
Don't bother with this unless you're returning multiple pages, and leave at least one page unfreed above the current top to avoid page faults if you grow again soon. Like maybe maintain a high-water mark that you've ever touched (without giving back) and a current logical size. If high_water - logical_size > 16 pages give back all page from 4 past the logical size up to the high water mark.
If you will typically be actually using/touching at least 2MiB of your array, use madvise(MADV_HUGEPAGE) when you allocate it to get the kernel to prefer using transparent hugepages. This will reduce TLB misses.
(Use strace to see return values from your madvise system calls, and look at /proc/PID/smaps, to see if your calls are having the desired effect.)
If up-front allocation is unacceptable, RCU (read-copy-update) might be viable if it's read-mostly. But copying a gigantic array every time an element changes isn't going to work.
You'd want a different data-structure entirely where only small parts need to be copied. Or something other than RCU; like your answer, you might not need the read side being always wait-free. The choice will depend on acceptable worst-case latency and/or average throughput, and also how much contention there is for any kind of ref counter that has to bounce around between all threads.
Too bad there isn't a realloc variant that attempts to grow without copying so you could attempt that before bothering other threads. (e.g. have threads with idx>len spin-wait on len in case it increases without the array address changing.)
So, I came up with a solution:
while(true) {
if (wait) {
} else {
int x = array[idx];
if (idx == len) {
wait = true;
while (cnt > 0); // busy wait to minimize latency of reallocation
array = realloc(array, 2*len*sizeof(int));
if (!array) abort(); // shit happens
len *= 2; // must not be updated before reallocation completed
wait = false;
// this is why len must be updated after realloc,
// it serves for synchronization with other writers
// exceeding the current length limit
while (idx > len) {yield();}
while(true) {
if (wait) {
} else {
array[idx] = x;
wait is an atomic bool initialized as false, cnt is an atomic int initialized as zero.
This only works because I know that task IDs are chosen ascendingly without gaps and that no task state is read before it is initialized by the write operation. So I can always rely on the one thread which pulls the ID which only exceeds current array length by 1. New tasks created concurrently will block their thread until the responsible thread performed reallocation. Hence the busy wait, since the reallocation should happen quickly so the other threads do not have to wait for too long.
This way, I eliminate the bottlenecking big lock. Array accesses can be made concurrently at the cost of two atomic additions. Since reallocation occurs seldom (due to exponential growth), array access is practically block-free.
After taking a second look, I noticed that one has to be careful about reordering of stores around the length update. Also, the whole thing only works if concurrent writes always use different indices. This is the case for my implementation, but might not generally be. Thus, this is not as elegant as I thought and the solution presented in the accepted answer should be preferred.

How can I make sure the std::vector allocated memory give back to operating system after deallocating?

The code below is calling foo and use while(1) to watch the memory usage. As I know, after 'finished' printed, var d is deallocated and the STL container will free the data space(heap) by himself.
#include <vector>
#include <string>
#include <iostream>
void foo() {
std::vector<std::string> d(100000000);
for(int i = 0; i < 100000000; ++i) d[i] = "1,1,3";
int main(int argc, char *argv[])
std::cout << "finished" << std::endl;
while(1) {;}
return 0;
But what I observed(using htop): memory is not freed back to the operating system. This is just a bench and the real code related to MESOS which has memory limitation for each process.
I have tried several versions of compiler such as g++-4.7.2 g++-4.8.1, clang++ on linux server with glibc 2.15. More, I also use tcmalloc instead of default malloc, but it still do not work(in MAC machine the problem will not happen).
What's the problem? How can I make sure the memory give back to os?
Thank you.
How can I make sure the memory give back to os?
You can terminate your process.
What's the problem?
There probably isn't one. It's normal for programs not to return memory (though Linux does return memory early for some particularly large allocations). They normally use sbrk or equivalent to grow the virtual address space available to them, but it's not normally worth the effort of trying to return deallocated memory. This may be counter-intuitive, but it's also proven workable for millions of programs over many decades, so you shouldn't bother yourself with it unless you have a specific tangible problem. It shouldn't cause problems for you as the deallocated memory will be reused when the application performs further allocations, so the "MESOS memory limitation for each process" you mention still affects the "high watermark" of maximum instantaneous memory usage the same way.
Note that OSes with virtual memory support may swap long unused deallocated pages to disk so the backing RAM can be reused by the kernel or other apps.
It's also possible to take manual control of this using e.g. memory mapped files, but writing such allocators and using the from Standard containers is a non-trivial undertaking... lots of other SO questions on how to approach that problem.
Allocating memory from the OS has two downsides:
High overhead. A system call involves a switch into protected mode which takes much longer than a simple function call, and then the memory management for the OS itself is probably quite complex.
High granularity. The OS probably has a minimum size allocation like 4K. That's a lot of overhead for a 6 byte string.
For these reasons the C++ memory allocator will only ask the OS for large blocks, then parcel out pieces of it when asked via new or malloc.
When those pieces of memory are released, they're put back into a pool to be handed out again on the next request. Now it's quite possible that all of the pieces of a larger block end up being freed, but how often does that happen in real life? Chances are that there will be at least one allocation per block that sticks around for a long time, preventing the block from being returned to the OS. And if it is returned, what do you think are the chances that the program will turn around and request it back again a short time later? As a practical matter it usually doesn't pay to return blocks to the OS. Your test program is a highly artificial case that isn't worth optimizing for.
In most modern systems the operating system manages memory in pages. Application memory is managed in pools (heaps) by library functions. When your application allocates memory, the library functions attempt to find an available block of the size you requested. If the memory is not in the pool, the library calls the system to add more pages to the process to incorporate into the pool(heap). When you free memory it goes back into the pool. The allocated pages in the pool do not return to the operating system.

How and why an allocation memory can fail?

This was an question I asked myself when I was a student, but failing to get a satisfying answer, I got it little by little out my mind... till today.
I known I can deal with an allocation memory error either by checking if the returned pointer is NULL or by handling the bad_alloc exception.
Ok, but I wonder: How and why the call of new can fail? Up to my knowledge, an allocation memory can fail if there is not enough space in the free store. But does this situation really occur nowadays, with several GB of RAM (at least on a regular computer; I am not talking about embedded systems)? Can we have other situations where an allocation memory failure may occur?
Although you've gotten a number of answers about why/how memory could fail, most of them are sort of ignoring reality.
In reality, on real systems, most of these arguments don't describe how things really work. Although they're right from the viewpoint that these are reasons an attempted memory allocation could fail, they're mostly wrong from the viewpoint of describing how things are typically going to work in reality.
Just for example, in Linux, if you try to allocate more memory than the system has available, your allocation will not fail (i.e., you won't get a null pointer or a strd::bad_alloc exception). Instead, the system will "over commit", so you get what appears to be a valid pointer -- but when/if you attempt to use all that memory, you'll get an exception, and/or the OOM Killer will run, trying to free memory by killing processes that use a lot of memory. Unfortunately, this may about as easily kill the program making the request as other programs (in fact, many of the examples given that attempt to cause allocation failure by just repeatedly allocating big chunks of memory should probably be among the first to be killed).
Windows works a little closer to how the C and C++ standards envision things (but only a little). Windows is typically configured to expand the swap file if necessary to meet a memory allocation request. This means that what as you allocate more memory, the system will go semi-crazy with swapping memory around, creating bigger and bigger swap files to meet your request.
That will eventually fail, but on a system with lots of drive space, it might run for hours (most of it madly shuffling data around on the disk) before that happens. At least on a typical client machine where the user is actually...well, using the computer, he'll notice that everything has dragged to a grinding halt, and do something to stop it well before the allocation fails.
So, to get a memory allocation that truly fails, you're typically looking for something other than a typical desktop machine. A few examples include a server that runs unattended for weeks at a time, and is so lightly loaded that nobody notices that it's thrashing the disk for, say, 12 hours straight, or a machine running MS-DOS or some RTOS that doesn't supply virtual memory.
Bottom line: you're basically right, and they're basically wrong. While it's certainly true that if you allocate more memory than the machine supports, that something's got to give, it's generally not true that the failure will necessarily happen in the way prescribed by the C++ standard -- and, in fact, for typical desktop machines that's more the exception (pardon the pun) than the rule.
Apart from the obvious "out of memory", memory fragmentation can also cause this. Imagine a program that does the following:
until main memory is almost full:
allocate 1020 bytes
allocate 4 bytes
free all the 1020 byte blocks
If the memory manager puts all these sequentially in memory in the order they are allocated, we now have plenty of free memory, but any allocation larger than 1020 bytes will not be able to find a contiguous space to put them, and fail.
Usually on modern machines it will fail due to scarcity of virtual address space; if you have a 32 bit process that tries to allocate more than 2/3 GB of memory1, even if there would be physical RAM (or paging file) to satisfy the allocation, simply there won't be space in the virtual address space to map such newly allocated memory.
Another (similar) situation happens when the virtual address space is heavily fragmented, and thus the allocation fails because there's not enough contiguous addresses for it.
Also, running out of memory can happen, and in fact I got in such a situation last week; but several operating systems (notably Linux) in this case don't return NULL: Linux will happily give you a pointer to an area of memory that isn't already committed, and actually allocate it when the program tries to write in it; if at that moment there's not enough memory, the kernel will try to kill some memory-hogging processes to free memory (an exception to this behavior seems to be when you try to allocate more than the whole capacity of the RAM and of the swap partition - in such a case you get a NULL upfront).
Another cause of getting NULL from a malloc may be due to limits enforced by the OS over the process; for example, trying to run this code
#include <cstdlib>
#include <iostream>
#include <limits>
void mallocbsearch(std::size_t lower, std::size_t upper)
std::cout<<"["<<lower<<", "<<upper<<"]\n";
std::cout<<"Found! "<<lower<<"\n";
std::size_t mid=lower+(upper-lower)/2;
void *ptr=std::malloc(mid);
mallocbsearch(mid, upper);
mallocbsearch(lower, mid);
int main()
mallocbsearch(0, std::numeric_limits<std::size_t>::max());
return 0;
on Ideone you find that the maximum allocation size is about 530 MB, which is probably a limit enforced by setrlimit (similar mechanisms exist on Windows).
it varies between OSes and can often be configured; the total virtual address space of a 32 bit process is 4 GB, but on all the current mainstream OSes a big chunk of it (the upper 2 GB by for 32 bit Windows with default settings) is reserved for kernel data.
The amount of memory available to the given process is finite. If the process exhausts its memory, and tries to allocate more, the allocation would fail.
There are other reasons why an allocation could fail. For example, the heap could get fragmented and not have a single free block large enough to satisfy the allocation request.

Why Does a Memory Leak not Continue after Peaking?

I created an intentional memory leak to demonstrate a point to people who will shortly be learning pointers.
int main()
while (1)
int *a = new int [2];
//delete [] a;
If this is run without the commented code, the memory stays low and doesn't rise, as expected. However, if this is run as is, then on a machine with 2GB of RAM, the memory usage rapidly rises to about 1.5GB, or whatever is not in use by the system. Once it hits this point though, the CPU usage (which was previously max) greatly falls, and the memory usage as well, down to about 100MB.
What exactly caused this intervening action (if there's something more specific than "Windows", that'd be great), and why does the program not take up the CPU it would looping, but not terminate either? It seems like it's stuck between the end of the loop and the end of main.
Windows XP, GCC, MinGW.
What's probably happening is that your code allocates all available physical RAM. When it reaches that limit, the system starts to allocate space on the swap file for it. That means it's (nearly) constantly waiting on the disk, so its CPU usage drops to (almost) zero.
The system may easily keep track of the fact that it never actually writes to the memory it allocates, so when it needs to be stored on the swap file, it'll just make a small record basically saying "process X has N bytes of uninitialized storage" instead of actually copying all the data to the hard drive (but I'm not sure of that, and it may well depend on the exact system you're using).
To paraphrase Inigo Montoya, "I don't think that means what you think that means." The Windows task manager doesn't display the memory usage data that you are looking for.
The "Mem Usge" column displays something related to the working set size (or the resident set size) of the process. That is, "Mem Usage" displays a number related to the amount of physical memory currently allocated to your proccess.
The "VM Size" column displays a number wholly unrelated to the virtual memory subsystem (it is actually the size of the private heaps allocated by the process.
Try using a different tool to visual virtual memory usage. I suggest Process Explorer.
I guess when the program exhausts the available physical memory, it starts to use on-disk (virtual) memory, and it becomes so slow, it seems as if it's inactive. Try adding some speed visualization:
int counter = 0;
while (1)
int *a = new int [2];
if (counter % 1000000 == 0)
std::cout << counter << '\n'
The default Memory column in the task manager of XP is the size of the working set of the process (the amount of physical memory allocated to that process), not the actual memory usage.
The "Mem Usage" column of the task manager is probably the "working set" as explained by a few answers in this question, although to be honest I still get confused how the task manager refers to memory as it changes from version to version. This value goes up/down as you are obviously not actually using much memory at any given time. If you look at the "VM Size" you should see it constantly increase until something bad happens.
You can also given Process Explorer a try which I find easily to understand in how it displays things.
Several things: first, if you're only allocating 2 ints at a time, it
could take hours before you notice that the total memory usage is going
up because of it. And second, on a lot of systems, allocation doesn't
commit until you actually access the memory; the address space may be
reserved, but you don't really have the memory (and the program will
crash if you try to access the memory and there isn't any available).
If you want to simulate a leak, I'd recommend allocating at least a page
at a time, if not considerably more, and writing at least one byte in
each allocated page.

Malloc allocates memory more than RAM

I just executed a program that mallocs 13 MB in a 12 MB machine (QEMU Emulated!) . Not just that, i even browsed through the memory and filled junk in it...
#define LONGMEM 13631488
long long *ptr = (long long *)malloc(LONGMEM);
long long i;
if(!ptr) {
printf("%s(): array allocation of size %lld failed.\n",__func__,LONGMEM);
for(i = 0 ; i < LONGMEM ; i++ ) {
How is it possible ? I was expecting a segmentation fault.
It's called virtual memory which is allocated for your program. It's not real memory which you call RAM.
There is a max limit for virtual memory as well, but it's higher than RAM. It's implemented (and defined) by your operating system.
This is called as Lazy Allocation.
Most OS like Linux have an Lazy Allocation memory model wherein the returned memory address is a virtual address and the actual allocation only happens at access-time. The OS assumes that it will be able to provide this allocation at access-Time.
The memory allocated by malloc is not backed by real memory until the program actually touches it.
While, since calloc initializes the memory to 0 you can be assured that the OS has already backed the allocation with actual RAM (or swap).
Try using callocand most probably it will return you out of memory unless your swap file/partition is big enough to satisfy the request.
Sounds like your operating system is swapping pages:
Paging is an important part of virtual memory implementation in most
contemporary general-purpose operating systems, allowing them to use
disk storage for data that does not fit into physical random-access
memory (RAM).
In other words, the operating system is using some of your hard disk space to satisfy your 13 MB allocation request (at great expense of speed, since the hard disk is much, much slower than RAM).
Unless the virtualized OS has swap available, what you're encountering is called overcommit, and it basically exists because the easy way to manage resources in a system with virtual memory and demand/copy-on-write pages is not to manage them. Overcommit is a lot like a bank loaning out more money than it actually has -- it seems to work for a while, then things come crashing down. The first thing you should do when setting up a Linux system is fix this with the command:
echo "2" > /proc/sys/vm/overcommit_memory
That just affects the currently running kernel; you can make it permanent by adding a line to /etc/sysctl.conf: