Quick'n'Dirty: Measure memory usage? - c++

There's plenty of detailed instructions to measure C++ memory usage. (Example links at the bottom)
If I've got a program processing and displaying pixel data, I can use Windows Task Manager to spot memory leakage if processing/displaying/closing multiple files in succession means the executable's Memory (Private working set) grows with each iteration. Yes, not perfect but processing 1000s of frames of data, this works as a quick'n'dirty solution.
To chase down a memory bug in that (large) project, I wrote a program to accurately measure memory usage, using the Lanzelot's useful answer. Namely, the part titled "Total Physical Memory (RAM)". But if I calloc the size of 1 double, I get 577536. Even if that's a quote in bits, that's a lot..
I tried writing a bog standard program to pause, assign some memory (let's say calloc a Megabyte worth of data) and pause again before free'ing said memory. Pause are long enough to let me comfortably look at WTM. Except the executable only grows by 4 K(!) per memory assignment.
What am I missing here? Is QtCreator or the compiler optimising the assigned memory? Why does the big complex project seemingly allow memory usage resolution down to ~1MB, while whatever memory size I fruitlessly fiddle with in my simple program, bare moves the memory shown in Windows Task Manager at all?
C++: Measuring memory usage from within the program, Windows and Linux
https://stackoverflow.com/a/64166/2903608
--- Edit ---
Example code, as simple as:
double *pDouble = (double*) calloc(1, sizeof(double));
*pDouble = 5.0;
qDebug() << "*pDouble: " << *pDouble;
If I look through WTM, this takes 4K (whether 1, or 1000000 double's). With Lanzelot's solution, north of 500k..

Related

top may be showing incorrect memory usage

I am writing a simple C++ program on Mac OS. I have just
int main()
{
int *n = new int[50000000];
}
I launch this program in lldb, and put a breakpoint at the line where n is allocated. Then I launch top in another tab, I see that memory usage is 336K pre-allocation. When I do n inside lldb, so that the allocation for n happens, I expect to my memory usage to go up. However, top shows me the same amount of memory used by my program. What could be the reason for this? I am trying to understand how memory allocation happens in C++, which is why I am doing this.
I have not exited the scope of main. When I check top again, I am sitting at closing curly brace for main.
The top command shows the process stats as viewed by the operating system. It shows how much memory was allocated to the process, but not how much of this memory is effectively in use. It's not accurate for monitoring memory allocation.
Memory allocation with heap and free store is implementation dependent in C++. But tt's usually not mapped one to one with OS allocation calls. For performance reasons (calls to the OS are slower than calls inside your userland code), the memory is received from OS in larger chunks:
when the c++ runtime starts, it usually allocates some memory from the OS, in order to allocate memory it needs for standard library objects, and to initalize the free store to quickly satisfy allocation request.
Only if this initial memory is exhausted will the standard library allocate more memory from the operating system.
And allocation is done again in larger chunks, so that not every new would raise an OS call.
From your observations, I guess that this initial allocation is larger than 50 MB. Try with a much larger value to see the difference.
If you want to track memory consumption more precisely, you need some profiling tools, for example valgrind or heap command

Allocating more memory than there exists using malloc

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?
It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).
from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.
No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.
This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n
Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);
1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.

How to large memory blocks Windows 7

I am trying to allocate a large block (5GB) of memory under Windows 7 Professional. I have a 64-bit machine and 16GB RAM and I'm using MS Visual Studio 10. For those of you who might ask why - its because I need to hold a 2-D raster representation of a map of reference numbers to polygon data, the maps can be up to 40,000 x 40,000 units. This has to go into RAM and fragmenting it into smaller blocks would be too expensive in runtimes.
So if I do this:
int _tmain(int argc, _TCHAR* argv[])
{
int t = INT_MAX;
int * test = new int[t/6];
delete test;
return 0;
}
The call to new fails, but
int * test = new int[t/7]; succeeds.
Investigating a bit more I found that the memory allocation is only using the memory which is designated as 'free' by the resource monitor. So when this is smaller than the allocation requested the allocation fails. The resource monitor tells me that (when I looked) I had ~5GB in use, ~10GB on standby, and a little over 1GB free.
As far as I understand it this should not happen. Surely if memory is requested it should be taken from the standby memory? If this is not the case then is there a way to reduce the amount of standby memory used by windows from inside C++?
The latter can be done outside C++ using RAMMap as I discovered from this post: Clear the windows 7 standby memory programmatically but unfortunately there was no useful answer to the question of programatic clearing. Perhaps I am luckier in C++.
Of course the more likely scenario that I am just missing something obvious.
Thanks
Remember that when you allocate on the heap, that memory will have to be a contiguous block. If the memory is fragmented, no block big enough to hold the allocation may be awailable, even though the total of free memory is much larger than what you request.
Memory fragmentation might be one of the issues: you could have that amount of memory free, but it is not continuous.
A better approach would be to allocate rather big chunks of memory (several megabytes) and link-list them. The probability that you will find space for such a chunk (and thus every of the chunks) is way higher than that you'll find continuous space for several gigabytes.
As of performance: as long as you are working on exactly one chunk, you will lose no speed as data remains in the CPU cache. If you switch often between two chunks (e.g. you swap items between them), you will get cache misses.
Anyways any workarounds like clearing the memory are like poker: you might get a chunk then or not. It depends on too many factors you can't control.

Why Does a Memory Leak not Continue after Peaking?

I created an intentional memory leak to demonstrate a point to people who will shortly be learning pointers.
int main()
{
while (1)
{
int *a = new int [2];
//delete [] a;
}
}
If this is run without the commented code, the memory stays low and doesn't rise, as expected. However, if this is run as is, then on a machine with 2GB of RAM, the memory usage rapidly rises to about 1.5GB, or whatever is not in use by the system. Once it hits this point though, the CPU usage (which was previously max) greatly falls, and the memory usage as well, down to about 100MB.
What exactly caused this intervening action (if there's something more specific than "Windows", that'd be great), and why does the program not take up the CPU it would looping, but not terminate either? It seems like it's stuck between the end of the loop and the end of main.
Windows XP, GCC, MinGW.
What's probably happening is that your code allocates all available physical RAM. When it reaches that limit, the system starts to allocate space on the swap file for it. That means it's (nearly) constantly waiting on the disk, so its CPU usage drops to (almost) zero.
The system may easily keep track of the fact that it never actually writes to the memory it allocates, so when it needs to be stored on the swap file, it'll just make a small record basically saying "process X has N bytes of uninitialized storage" instead of actually copying all the data to the hard drive (but I'm not sure of that, and it may well depend on the exact system you're using).
To paraphrase Inigo Montoya, "I don't think that means what you think that means." The Windows task manager doesn't display the memory usage data that you are looking for.
The "Mem Usge" column displays something related to the working set size (or the resident set size) of the process. That is, "Mem Usage" displays a number related to the amount of physical memory currently allocated to your proccess.
The "VM Size" column displays a number wholly unrelated to the virtual memory subsystem (it is actually the size of the private heaps allocated by the process.
Try using a different tool to visual virtual memory usage. I suggest Process Explorer.
I guess when the program exhausts the available physical memory, it starts to use on-disk (virtual) memory, and it becomes so slow, it seems as if it's inactive. Try adding some speed visualization:
int counter = 0;
while (1)
{
int *a = new int [2];
++counter;
if (counter % 1000000 == 0)
std::cout << counter << '\n'
}
The default Memory column in the task manager of XP is the size of the working set of the process (the amount of physical memory allocated to that process), not the actual memory usage.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684891%28v=vs.85%29.aspx
http://blogs.msdn.com/b/salvapatuel/archive/2007/10/13/memory-working-set-explored.aspx
The "Mem Usage" column of the task manager is probably the "working set" as explained by a few answers in this question, although to be honest I still get confused how the task manager refers to memory as it changes from version to version. This value goes up/down as you are obviously not actually using much memory at any given time. If you look at the "VM Size" you should see it constantly increase until something bad happens.
You can also given Process Explorer a try which I find easily to understand in how it displays things.
Several things: first, if you're only allocating 2 ints at a time, it
could take hours before you notice that the total memory usage is going
up because of it. And second, on a lot of systems, allocation doesn't
commit until you actually access the memory; the address space may be
reserved, but you don't really have the memory (and the program will
crash if you try to access the memory and there isn't any available).
If you want to simulate a leak, I'd recommend allocating at least a page
at a time, if not considerably more, and writing at least one byte in
each allocated page.

C++ new / new[], how is it allocating memory?

I would like to now how those instructions are allocating memory.
For example what if I got code:
x = new int[5];
y = new int[5];
If those are allocated how it actually looks like in RAM?
Is whole block reserved for each of the variables or block(memory page or how-you-call-it - 4KB of size on 32bit) is shared for 2 variables?
I couldn't find answer for my question in any manual. Thanks for all replies.
I found on wikipedia:
Internal fragmentation of pages
Rarely do processes require the use of an exact number of pages. As a result, the last page will likely only be partially full, wasting some amount of memory. Larger page sizes clearly increase the potential for wasted memory this way, as more potentially unused portions of memory are loaded into main memory. Smaller page sizes ensure a closer match to the actual amount of memory required in an allocation.
As an example, assume the page size is 1024KB. If a process allocates 1025KB, two pages must be used, resulting in 1023KB of unused space (where one page fully consumes 1024KB and the other only 1KB).
And that was answer for my question. Anyway thanks guys.
A typical allocator implementation will first call the operating system to get huge block of memory, and then to satisfy your request it will give you a piece of that memory, this is known as suballocation. If it runs out of memory, it will get more from the operating system.
The allocator must keep track of both all the big blocks it got from the operating system and also all the small blocks it handed out to its clients. It also must accept blocks back from clients.
A typical suballocation algorithm keeps a list of returned blocks of each size called a freelist and always tries to satisfy a request from the freelist, only going to the main block if the freelist is empty. This particular implementation technique is extremely fast and quite efficient for average programs, though it has woeful fragmentation properties if request sizes are all over the place (which is not usual for most programs).
Modern allocators like GNU's malloc implementation are complex, but have been built with many decades of experience and should be considered so good that it is very rare to need to write your own specialised suballocator.
You didn't find it in the manual because it's not specified by the standard. That is, most of the time x and y will be side by side (go ahead and cout<< hex << their addresses).
But nothing in the standard forces this so you can't rely on it.
Each process has different segments associated which are divided among the process address space :
1) The text segment :: Where your code is placed
2) Stack Segment :: Process stack
3) Data Segment :: This is where memory by "new" is reserved. Besides that it also store initialized and uninitialized static data (bss etc).
So , whenever you call a new function (which i guess uses malloc internally , but the new class makes it much safer to handle memory) it allocates the specified number of bytes in the data segment.
Ofcourse the address you print while running the program is virtual and needs to be translated to physical address..but that is not our headache and the OS Memory management unit does that for us.