How can I measure runtime memory requirements of an application on windows platform?

How can I measure runtime memory requirements of an application on windows platform? - c++

How can I measure runtime memory requirements of an application on windows platform?

Perfmon.exe will monitor the usage of a process.
Run perfmon.exe, right-click Add counters, pick Process for the Performance Object, then choose things like Virtual Bytes, Working Set, and Page File.

I'll assume you mean memory use at a particular point in time, not how much it could potentially ever need.
You can get the information about how much a process is consuming through the windows API, for example GetProcessMemoryInfo. Windows allocates memory in blocks so it may be more accurate than just checking how much memory or heap space is used.
See more details from MSDN

"Memory requirements" are not very well defined, in the first place. WHen you start, your executabel will be linked to many DLLs. Together with the first stack, this forms your initial process. Then, your proces might start extra threads, allocate more memory, and/or memory map some files.
Now Wwindows won't give you real RAM for all these needs. Many DLLs are already loaded for other reasons, so you'll share that RAM. Extra RAM for stacks is allocated when you get soft stackoverflows. Memory mapped files get RAM allocated when those pages fault.
So, one of the important questions is what you really want. You have to answer that first.

Related

What part of the process virtual memory does Windows Task Manager display

My question is a bit naive. I'm willing to have an overview as simple as possible and couldn't find any resource that made it clear to me. I am a developer and I want to understand what exactly is the memory displayed in the "memory" column by default in Windows Task Manager:
To make things a bit simpler, let's forget about the memory the process shares with other processes, and imagine the shared memory is negligible. Also I'm focussed on the big picture and mainly care for things at GB level.
As far as I know, the memory reserved by the process called "virtual memory", is partly stored in the main memory (RAM), partly on the disk. The system decides what goes where. The system basically keeps in RAM the parts of the virtual memory that is accessed sufficiently frequently by the process. A process can reserve more virtual memory than RAM available in the computer.
From a developer point of view, the virtual memory may only be partially allocated by the program through its own memory manager (with malloc() or new X() for example). I guess the system has no awareness of what part of the virtual memory is allocated since this is handled by the process in a "private" way and depends on the language, runtime, compiler... Q: Is this correct?
My hypothesis is that the memory displayed by the task manager is essentially the part of the virtual memory being stored in RAM by the system. Q: Is it correct? And is there a simple way to know the total virtual memory reserved by the process?

Memory on windows is... extremely complicated and asking 'how much memory does my process use' is effectively a nonsensical question. TO answer your questions lets get a little background first.
Memory on windows is allocated via ptr = VirtualAlloc(..., MEM_RESERVE, ...) and committed later with VirtualAlloc(ptr+n, MEM_COMMIT, ...).
Any reserved memory just uses up address space and so isn't interesting. Windows will let you MEM_RESERVE terabytes of memory just fine. Committing the memory does use up resources but not in the way you'd think. When you call commit windows does a few sums and basically works out (total physical ram + total swap - current commit) and lets you allocate memory if there's enough free. BUT the windows memory manager doesn't actually give you physical ram until you actually use it.
Later, however, if windows is tight for physical RAM it'll swap some of your RAM out to disk (it may compress it and also throw away unused pages, throw away anything directly mapped from a file and other optimisations). This means your total commit and total physical ram usage for your program may be wildly different. Both numbers are useful depending on what you're measuring.
There's one last large caveat - memory that is shared. When you load DLLs the code, the read-only memory [and even maybe the read/write section but this is COW'd] can be shared with other programs. This means that your app requires that memory but you cannot count that memory against just your app - after all it can be shared and so doesn't take up as much physical memory as a naive count would think.
(If you are writing a game or similar you also need to count GPU memory but I'm no expert here)
All of the above goodness is normally wrapped up by the heap the application uses and you see none of this - you ask for and use memory. And its just as optimal as possible.
You can see this by going to the details tab and looking at the various options - commit-size and working-set are really useful. If you just look at the main window in task-manager and it has a single value I'd hope you understand now that a single value for memory used has to be some kind of compromise as its not a question that makes sense.
Now to answer your questions
Firstly the OS knows exactly how much memory your app has reserved and how much it has committed. What it doesn't know is if the heap implementation you (or more likely the CRT) are using has kept some freed memory about which it hasn't released back to the operation system. Heaps often do this as an optimisation - asking for memory from the OS and freeing it back to the OS is a fairly expensive operation (and can only be done in large chunks known as pages) and so most of them keep some around.
Second question: Dont use that value, go to details and use the values there as only you know what you actually want to ask.
EDIT:
For your comment, yes, but this depends on the size of the allocation. If you allocate a large block of memory (say >= 1MB) then the heap in the CRT generally directly defers the allocation to the operating system and so freeing individual ones will actually free them. For small allocations the heap in the CRT asks for pages of memory from the operating system and then subdivides that to give out in allocations. And so if you then free every other one of those you'll be left with holes - and the heap cannot give those holes back to the OS as the OS generally only works in whole pages. So anything you see in task manager will show that all the memory is still used. Remember this memory isn't lost or leaked, its just effectively pooled and will be used again if allocations ask for that size. If you care about this memory you can use the crt heap statistics famliy of functions to keep an eye on those - specifically _CrtMemDumpStatistics

Memory usage and minimizing

We have a fairly graphical intensive application that uses the FOX toolkit and OpenSceneGraph, and of course C++. I notice that after running the application for some time, it seems there is a memory leak. However when I minimize, a substantial amount of memory appears to be freed (as witnessed in the Windows Task Manager). When the application is restored, the memory usage climbs but plateaus to an amount less than what it was before the minimize.
Is this a huge indicator that we have a nasty memory leak? Or might this be something with how Windows handles graphical applications? I'm not really sure what is going on.

What you are seeing is simply memory caching. When you call free()/delete()/delete, most implementations won't actually return this memory to the OS. They will keep it to be returned in a much faster fashion the next time you request it. When your application is minimized, they will free this memory because you won't be requesting it anytime soon.
It's unlikely that you have an actual memory leak. Task Manager is not particularly accurate, and there's a lot of behaviour that can change the apparent amount of memory that you're using- even if you released it properly. You need to get an actual memory profiler to take a look if you're still concerned.
Also, yes, Windows does a lot of things when minimizing applications. For example, if you use Direct3D, there's a device loss. There's thread timings somethings. Windows is designed to give the user the best experience in a single application at a time and may well take extra cached/buffered resources from your application to do it.

No, there effect you are seeing means that your platform releases resources when it's not visible (good thing), and that seems to clear some cached data, which is not restored after restoring the window.
Doing this may help you find memory leaks. If the minimum amount of memory (while minimized) used by the app grows over time, that would suggest a leak.

You are looking at the working set size of your program. The sum of the virtual memory pages of your program that are actually in RAM. When you minimize your main window, Windows assumes the user won't be interested in the program for a while and aggressively trims the working set. Copying the pages in RAM to the paging file and chucking them out, making room for the other process that the user is likely to start or to switch to.
This number will also go down automatically when the user starts another program that needs a lot of RAM. Windows chucks out your pages to make room for this program. It picks pages that your program hasn't used for a while, making it likely that this doesn't affect the perf of your program much.
When you switch back to your program, Windows needs to swap pages back into RAM. But this is on-demand, it only pages-in pages that your program actually uses. Which will normally be less than what it used before, no need to swap the initialization code of your program back in for example.
Needless to say perhaps, the number has absolutely nothing to do with the memory usage of your program, it is merely a statistical number.
Private bytes would be a better indicator for a memory leak. Taskmgr doesn't show that, SysInternals' ProcMon tool does. It still isn't a great indicator because that number also includes any blocks in the heap that were freed by your program and were added to the list of free blocks, ready to be re-used. There is no good way to measure actual memory in use, read the small print for the HeapWalk() API function for the kind of trouble that causes.
The memory and heap manager in Windows are far too sophisticated to draw conclusions from the available numbers. Use a leak detection tool, like the VC debug allocator (crtdbg.h).

Information about PTE's (Page Table Entries) in Windows

In order to find more easily buffer overflows I am changing our custom memory allocator so that it allocates a full 4KB page instead of only the wanted number of bytes. Then I change the page protection and size so that if the caller writes before or after its allocated piece of memory, the application immediately crashes.
Problem is that although I have enough memory, the application never starts up completely because it runs out of memory. This has two causes:
since every allocation needs 4 KB, we probably reach the 2 GB limit very soon. This problem could be solved if I would make a 64-bit executable (didn't try it yet).
even when I only need a few hundreds of megabytes, the allocations fail at a certain moment.
The second problem is the biggest one, and I think it's related to the maximum number of PTE's (page table entries, which store information on how Virtual Memory is mapped to physical memory, and whether pages should be read-only or not) you can have in a process.
My questions (or a cry-for-tips):
Where can I find information about the maximum number of PTE's in a process?
Is this different (higher) for 64-bit systems/applications or not?
Can the number of PTE's be configured in the application or in Windows?
Thanks,
Patrick
PS. note for those who will try to argument that you shouldn't write your own memory manager:
My application is rather specific so I really want full control over memory management (can't give any more details)
Last week we had a memory overwrite which we couldn't find using the standard C++ allocator and the debugging functionality of the C/C++ run time (it only said "block corrupt" minutes after the actual corruption")
We also tried standard Windows utilities (like GFLAGS, ...) but they slowed down the application by a factor of 100, and couldn't find the exact position of the overwrite either
We also tried the "Full Page Heap" functionality of Application Verifier, but then the application doesn't start up either (probably also running out of PTE's)

There is what i thought was a great series of blog posts by Mark Russinovich on technet called "Pushing the limits of Windows..."
http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
It has a few articles on virtual memory, paged nonpaged memory, physical memory and others.
He mentions little utilities he uses to take measurements about a systems resources.
Hopefully you will find your answers there.

A shotgun approach is to allocate those isolated 4KB entries at random. This means that you will need to rerun the same tests, with the same input repeatedly. Sometimes it will catch the error, if you're lucky.
A slightly smarter approach is to use another algorithm than just random - e.g. make it dependent on the call stack whether an allocation is isolated. Do you trust std::string users, for instance, and suspect raw malloc use?

Take a look at the implementation of OpenBSD malloc. Much of the same ideas (and more) implemented by very skilled folk.

In order to find more easily buffer
overflows I am changing our custom
memory allocator so that it allocates
a full 4KB page instead of only the
wanted number of bytes.
This has already been done. Application Verifier with PageHeap.
Info on PTEs and the Memory architecture can be found in Windows Internals, 5th Ed. and the Intel Manuals.
Is this different (higher) for 64-bit systems/applications or not?
Of course. 64bit Windows has a much larger address space, so clearly more PTEs are needed to map it.
Where can I find information about the
maximum number of PTE's in a process?
This is not so important as the maximum amount of user address space available in a process. (The number of PTEs is this number divided by the page size.)
This is 2GB on 32 bit Windows and much bigger on x64 Windows. (The actual number varies, but it's "big enough").
Problem is that although I have enough
memory, the application never starts
up completely because it runs out of
memory.
Are you a) leaking memory? b) using horribly inefficient algorithms?

Any useful suggestions to figure out where memory is being free'd in a Win32 process?

An application I am working with is exhibiting the following behaviour:
During a particular high-memory operation, the memory usage of the process under Task Manager (Mem Usage stat) reaches a peak of approximately 2.5GB (Note: A registry key has been set to allow this, as usually there is a maximum of 2GB for a process under 32-bit Windows)
After the operation is complete, the process size slowly starts decreasing at a rate of 1MB per second.
I am trying to figure out the easiest way to quickly determine who is freeing this memory, and where it is being free'd.
I am having trouble attaching a memory profiler to my code, and I don't particularly want to override the new/delete operators to track the allocations/deallocations (IOW, I want to do this without re-compiling my code).
Can anyone offer any useful suggestions of how I could do this via the Visual Studio debugger?
Update
I should also mention that it's a multi-threaded application, so pausing the application and analysing the call stack through the debugger is not the most desirable option. I considered freezing different threads one at a time to see if the memory stops reducing, but I'm fairly certain this will cause the application to crash.

Ahh! You're looking at the wrong counter!
Mem Usage doesn't tell you that memory is being freed. Only that the working set is being purged! This could mean some other application needs memory, or the VMM decided to mark some of your process's pages as Stand By for some other process to quickly use. It does not mean that VirtualFree, HeapFree or any other free function is being called.
Look at the commit size (VM Size, Private Bytes, etc).
But if you still want to know when memory is being decommitted or freed or what-have-you, then break on some free calls. E.g. (for Visual C++)
{,,kernel32.dll}HeapFree
or
{,,msvcr80.dll}free
etc.
Or just a regular function breakpoint on the above. Just make sure it resolves the address.
cdb/WinDbg let you do it via
bp kernel32!HeapFree
bp msvcrt!free
etc.
Names may vary depending on which CRT version you use and how you link against it (via /MT or /MD and its variants)

You might find this article useful:
http://www.gamasutra.com/view/feature/1430/monitoring_your_pcs_memory_usage_.php?print=1
basically what I had in mind was hooking the low level allocation functions.

A couple different ideas:
The C runtime has a set of memory debugging functions; you'd need to recompile though. You could get a snapshot at computation completion and later, and use _CrtMemDifference to see what changed.
Or, you can attach to the process in your debugger, and cause it to dump a core before and after the memory freeing. Using NTSD, you can see what heaps are around, and the sizes of things. (You'll need a lot of disk space, and a fair amount of patience.) There's a setting (I think you get it through gflags, but I don't remember) that causes it to save a piece of the call stack as part of the dump; using that you can figure out what kind of object is being deallocated. Unfortunately, it only stores 4 or 5 stack frames, so you'll likely have to do something more clever as the next step to figure out where it's being freed. Either look at the code ("oh yeah, there's only one place where that can happen") or put in breakpoints on those destructors, or add tracing to the allocations and deallocations.

If your memory manager wipes free'd data to a known value (usually something like 0xfeeefeee), you can set a data breakpoint on a particular instance of something you're interested in. When it gets free'd, the breakpoint will trigger when the memory gets wiped.

I recommend you to check UMDH tool that comes with Debugging Tools For Windows (You can find usage and samples in the debugging tools help). You can snap shot running process's heap allocations with stack trace and compare them.

You could try Memory Validator to monitor the allocations and deallocations. Memory Validator has a couple of features that will help you identify where data is being deallocated:
Hotspots view. This can show you a tree of all allocations and deallocations or just all allocations or just all deallocations. It presents the data as a percentage of memory activity (based on amount of memory (de)allocated at a given location).
Analysis view. You can perform queries asking for data in a given address range. You can restrict these queries to any of alloc, realloc, dealloc behaviours.
Objects view. You can view allocations by type and see the maximum number of objects of each type (plus lots of other stats). Right click on a type to get a context menu, choose show all deallocations - will show deallocation locations for that type on Analysis tab.
I think the Hotspots view may give you the insight you need.

Which one is faster, reading from disk or allocate system memory

My environment is XP 32-bit. I find when allocated memory is nearly the maximum size, 2GB, that means a little virtual space is available, allocationnew memory is very slow.
So if I have a page file, my app need to analyze them.
I have two ways. One is to read them all into system memory, then do the analysis.
The other is to reserv a memory buffer first as a cache, and read part of page file into that buffer, analyze and then discard it, then read second part of page file, and override the cache, do the analysis again.
From the profiling, it looks the second one is faster, since it avoid the allocation time cost.
What do you think? Thanks in adavance.

(1) I'm not sure the question matches the title. If you're allocating close to 2GB of RAM on 32 bit Windows, the system is probably paging a lot of memory to disk, and that's where I'd look first for the slow down. When you're using a lot of memory, you should think of it as being stored on disk (in pagefile.sys) but cached in physical RAM. The second one might be faster not because of the cost of doing allocation, but because of the cost of using a lot of memory at once. In effect when you copy the file into one big allocation you're copying much of it disk->disk via RAM, then when you run over it again to analyse, you're loading the copy back to RAM again. If your analysis is a single-pass algorithm that's a lot of redundant work.
(2) What I think is, mmap the file (MapViewOfFile and friends on Windows).
Edit: (3) a caution. If the file is currently 1.8GB, there might be a chance that next year it might be 4GB. If so, I'd plan now for it to have a size greater than 2^32 on a 32bit machine, which means either taking your second option, or else still using MapViewOfFile but doing it one sensible-sized chunk of the file at a time, rather than all at once. Otherwise you'll be revisiting this code the first time someone tries it on a big file and reports the bug.

You forget 3d way - to map memory onto file, see function CreateFileMapping/MapViewOfFile
This is most fast way

You best bet is to use the windows MapViewOfFile and similar functions (the Windows equivalent of mmap). This will allow the operating system to manage the paging in of various parts of the file.

Why is the amount allocated memory so high? If memory allocations take a reasonable amount of time then you will find doing it in memory is far far quicker - my approach would be to do it in memory, and try to find a way to reduce the memory usage to the point where its quick again.

As I see the situation, you either manage the paging yourself or let the operating system manage the paging for you. In most cases I would suggest letting the operating system handle the paging (use virtual memory). Since I have a distrust of MS operating systems, I cannnot recommend this technique, although your mileage may vary.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js