Win32 Finding all allocated memory - c++

I am would like to know if my assumption is correct, in my project i would like to know exactly what memory my process and child process allocated, so after a research i cam across win32 api, GetProcessHeaps(), the documentation tells me i can enumerate all heaps that process has allocated, and gets its size. However i ran into another question, where a stack would be located in each thread. I expiremented with GetCurrentThreadStackLimits() which returns start address and end address. But i was not able to read directly from this memory.
Maybe some one can direct me in the right way, or explain a bit about how Locate each chunk of memory that the process uses.
Basically a debugger somehow knows what part of memory u have reserved and what parts of it u did not. therefore, some part of virtual memory you can read, and some parts you just cant, cause you haven't reserved it, and it is not mapped to physical memory.
Question is mostly about, enumerating allocation, determine their location and size, and reading from them. Just like a debugger does.

Related

What part of the process virtual memory does Windows Task Manager display

My question is a bit naive. I'm willing to have an overview as simple as possible and couldn't find any resource that made it clear to me. I am a developer and I want to understand what exactly is the memory displayed in the "memory" column by default in Windows Task Manager:
To make things a bit simpler, let's forget about the memory the process shares with other processes, and imagine the shared memory is negligible. Also I'm focussed on the big picture and mainly care for things at GB level.
As far as I know, the memory reserved by the process called "virtual memory", is partly stored in the main memory (RAM), partly on the disk. The system decides what goes where. The system basically keeps in RAM the parts of the virtual memory that is accessed sufficiently frequently by the process. A process can reserve more virtual memory than RAM available in the computer.
From a developer point of view, the virtual memory may only be partially allocated by the program through its own memory manager (with malloc() or new X() for example). I guess the system has no awareness of what part of the virtual memory is allocated since this is handled by the process in a "private" way and depends on the language, runtime, compiler... Q: Is this correct?
My hypothesis is that the memory displayed by the task manager is essentially the part of the virtual memory being stored in RAM by the system. Q: Is it correct? And is there a simple way to know the total virtual memory reserved by the process?
Memory on windows is... extremely complicated and asking 'how much memory does my process use' is effectively a nonsensical question. TO answer your questions lets get a little background first.
Memory on windows is allocated via ptr = VirtualAlloc(..., MEM_RESERVE, ...) and committed later with VirtualAlloc(ptr+n, MEM_COMMIT, ...).
Any reserved memory just uses up address space and so isn't interesting. Windows will let you MEM_RESERVE terabytes of memory just fine. Committing the memory does use up resources but not in the way you'd think. When you call commit windows does a few sums and basically works out (total physical ram + total swap - current commit) and lets you allocate memory if there's enough free. BUT the windows memory manager doesn't actually give you physical ram until you actually use it.
Later, however, if windows is tight for physical RAM it'll swap some of your RAM out to disk (it may compress it and also throw away unused pages, throw away anything directly mapped from a file and other optimisations). This means your total commit and total physical ram usage for your program may be wildly different. Both numbers are useful depending on what you're measuring.
There's one last large caveat - memory that is shared. When you load DLLs the code, the read-only memory [and even maybe the read/write section but this is COW'd] can be shared with other programs. This means that your app requires that memory but you cannot count that memory against just your app - after all it can be shared and so doesn't take up as much physical memory as a naive count would think.
(If you are writing a game or similar you also need to count GPU memory but I'm no expert here)
All of the above goodness is normally wrapped up by the heap the application uses and you see none of this - you ask for and use memory. And its just as optimal as possible.
You can see this by going to the details tab and looking at the various options - commit-size and working-set are really useful. If you just look at the main window in task-manager and it has a single value I'd hope you understand now that a single value for memory used has to be some kind of compromise as its not a question that makes sense.
Now to answer your questions
Firstly the OS knows exactly how much memory your app has reserved and how much it has committed. What it doesn't know is if the heap implementation you (or more likely the CRT) are using has kept some freed memory about which it hasn't released back to the operation system. Heaps often do this as an optimisation - asking for memory from the OS and freeing it back to the OS is a fairly expensive operation (and can only be done in large chunks known as pages) and so most of them keep some around.
Second question: Dont use that value, go to details and use the values there as only you know what you actually want to ask.
EDIT:
For your comment, yes, but this depends on the size of the allocation. If you allocate a large block of memory (say >= 1MB) then the heap in the CRT generally directly defers the allocation to the operating system and so freeing individual ones will actually free them. For small allocations the heap in the CRT asks for pages of memory from the operating system and then subdivides that to give out in allocations. And so if you then free every other one of those you'll be left with holes - and the heap cannot give those holes back to the OS as the OS generally only works in whole pages. So anything you see in task manager will show that all the memory is still used. Remember this memory isn't lost or leaked, its just effectively pooled and will be used again if allocations ask for that size. If you care about this memory you can use the crt heap statistics famliy of functions to keep an eye on those - specifically _CrtMemDumpStatistics

Do pictures ever get stored in RAM?

I am a beginner C++ programmer.
I wrote a simple program that creates a char array (the size is user's choice) and reads what previous information was in it. Often you can find something that makes sense but most of it is just strange characters. I made it output into a binary file.
Why do I often find multiple copies of the alphabet?
Is it possible to find a picture inside of the RAM chunk I retrieved?
I heard about file signatures (headers), which goes before any of the data in a file, but do "trailers" go in the back after all the data?
When you read uninitialized data from memory that you allocated, you'll never see any data from another process. You only ever see data that your own process has written. That is: your code plus all the libraries that you called.
This is a security feature of your kernel: It never leaks information from a process unless it's specifically asked to transfer that information.
If you didn't load a picture in memory, you'll never see one using this method.
Assumning your computer runs Linux, Windows, MacOS or something like that, there will NEVER be any pictures in the memory your process uses - unless you loaded them into your process. For security reasons, the memory used by other processes is cleared before it gets given to YOUR process. This is the case for all modern OS's, and has been the case for multi-user OS's (Unix, VAX-VMS, etc) more or less since they were first invented in the late 1950's or early 1960's - because someone figured out that it's kind of unfun when "your" data is found by someone else who is just out there fishing for it.
Even a process that has ended will have it's memory cleared - how would you like it if your password was still stored in memory for someone to find when the program that reads the password ended? [Programs that hold highly sensitive data, such as encryption keys or passwords, often manually (as in using code, but not waiting until the OS clears it when the process ends) clear the memory used to store such, because of the below debug functionally allowing the memory content to be inspected at any time, and the shorter time, the less likely a leak of sensitive information]
Once memory has been allocated to your process, and freed again, it will contain whatever happens to be in that memory, as clearing it takes extra time, and most of the time, you'd want to fill it with something else anyway. So it contains whatever it happens to contain, and if you poke around it, you will potentially "find stuff". But it's all your own processes work.
Most OS's have a way to read what another process is doing as part of the debug functionality (if you run the "debugger" in your system, it will of course run as a separate process, but needs to be able to access your program when you debug it, so there needs to be ways to read the memory of that process), but that requires a little more effort than just calling new or malloc (and you either will need to have extra permissions (superuser, adminstrator, etc), or be the owner of the other process too).
Of course, if your computer is running DOS or CP/M, it has no such security features, and you get whatever happens to be in the memory (and you could also just make up a pointer to an arbitrary address and read it, as long as you stay within the memory range of the system).

Dynamic memory allocation and memory block metadata

I have a question about low level stuff of dynamic memory allocation.
I understand that there may be different implementations, but I need to understand the fundamental ideas.
So,
when a modern OS memory allocator or the equivalent allocates a block of memory, this block needs to be freed.
But, before that happends, some system needs to exist to control the allocation process.
I need to know:
How this system keeps track of allocated and unallocated memory. I mean, the system needs to know what blocks have already been allocated and what their size is to use this information in allocation and deallocation process.
Is this process supported by modern hardware, like allocation bits or something like that?
Or is some kind of data structure used to store allocation information.
If there is a data structure, how much memory it uses compared to the allocated memory?
Is it better to allocate memory in big chunks rather than small ones and why?
Any answer that can help reveal fundamental implementation details is appreciated.
If there is a need for code examples, C or C++ will be just fine.
"How this system keeps track of allocated and unallocated memory." For non-embedded systems with operating systems, a virtual page table, which the OS is in charge of organizing (with hardware TLB support of course), tracks the memory usage of programs.
AS FAR AS I KNOW (and the community will surely yell at me if I'm mistaken), tracking individual malloc() sizes and locations has a good number of implementations and is runtime-library dependent. Generally speaking, whenever you call malloc(), the size and location is stored in a table. Whenever you call free(), the table entry for the provided pointer is looked up. If it is found, that entry is removed. If it is not found, the free() is ignored (which also indicates a possible memory leak).
When all malloc() entries in a virtual page are freed, that virtual page is then released back to the OS (this also implies that free() does not always release memory back to the OS since the virtual page may still have other malloc() entries in it). If there is not enough space within a given virtual page to support another malloc() of a specified size, another virtual page is requested from the OS.
Embedded processors usually don't have operating systems, virtual page tables, nor multiple processes. In this case, virtual memory is not used. Instead, the entire memory of the embedded processor is treated like one large virtual page (although the addresses are actually physical addresses) and memory management follows a similar process as previously described.
Here is a similar stack overflow question with more in-depth answers.
"Is it better to allocate memory in big chunks rather than small ones and why?" Allocate as much memory as you need, no more and no less. Compiler optimizations are very smart, and memory will almost always be managed more efficiently (i.e. reducing memory fragmentation) than the programmer can manually do. This is especially true in a non-embedded environment.
Here is a similar stack overflow question with more in-depth answers (note that it pertains to C and not C++, however it is still relevant to this discussion).
Well, there are more than one way to achieve that.
I once had to wrote a malloc() (and free()) implementation for educational purpose.
This is from my experience, and real world implementation surely vary.
I used a double linked list.
Memory chunk returned to the user after calling malloc() were in fact a struct containing relevant information to my implementation (ie the next and prev pointer, and a is_used byte).
So when a user request N bytes I allocated N + sizeof(my_struct) bytes, hiding next and prev pointers at the begenning of the chunk, and returning what's left to the user.
Surely, this is poor design for a program that use a lot of small allocation (because each allocation takes up to N + 2 pointers + 1 byte).
For a real world implementation, you can take a look to the code of good and well known memory allocator.
Normally there exist two different layers.
One layer lives at application level, usually as part of the C standard library. This is what you call through functions like malloc and free (or operator new in C++, which in turn usually calls malloc). This layer takes care of your allocations, but does not know about memory or where it comes from.
The other layer, at OS level, does not know and does not care anything about your allocations. It only maintains a list of fixed-size memory pages that have been reserved, allocated, and accessed, and with each page information such as where it maps to.
There are many different implementations for either layer, but in general it works like this:
When you allocate memory, the allocator (the "application level part") looks whether it has a matching block somewhere in its books that it can give to you (some allocators will split a larger block in two, if need be).
If it doesn't find a suitable block, it reserves a new block (usually much larger than what you ask for) from the operating system. sbrk or mmap on Linux, or VirtualAlloc on Windows would be typical examples of functions it might use for that effect.
This does very little apart from showing intent to the operating system, and generating some page table entries.
The allocator then (logically, in its books) splits up that large area into smaller pieces according to its normal mode of operation, finds a suitable block, and returns it to you. Note that this returned memory does not necessarily even exist as phsyical memory (though most allocators write some metadata into the first few bytes of each allocated unit, so they necessarily pre-fault the pages).
In the mean time, invisibly, a background task zeroes out memory pages that were in use by some process once but have been freed. This happens all the time, on a tentative base, since sooner or later someone will ask for memory (often, that's what the idle task does).
Once you access an address in the page that contains your allocated block for the first time, you generate a fault. The page table entry of this yet non-existent page (it logically exists, just not phsyically) is replaced with a reference to a page from the pool of zero pages. In the uncommon case that there is none left, for example if huge amounts of memory are being allocated all the time, the OS swaps out a page which it believes will not be accessed any time soon, zeroes it, and returns this one.
Now the page becomes part of your working set, it corresponds to actual phsyical memory, and it accounts towards your process' quota. While your process is running, pages may be moved in and out of your working set, or may be paged out and in, as you exceed certain limits, and according to how much memory is needed and how it is accessed.
Once you call free, the allocator puts the freed area back into its books. It may tell the OS that it does not need the memory any more instead, but usually this does not happen as it is not really necessary and it is more efficient to keep around a little extra memory and reuse it. Also, it may not be easy to free the memory because usually the units that you allocate/deallocate do not directly correspond with the units the OS works with (and, in the case of sbrk they'd need to happen in the correct order, too).
When the process ends, the OS simply throws away all page table entries and adds all pages to the list of pages that the idle task will zero out. So the physical memory becomes available to the next process asking for some.

Mapping of several big files into memory

In our application we have to be able to map several (i.e. maybe up to 4) files into memory (via mapViewOfFile). For a long time this has not been a problem, but as the files were getting bigger and bigger over the last years, now memory fragmentation prevents us from mapping those big files (files will be about 200 MB). The problem may already exist if no other files are loaded at that moment.
I am now looking for a way t make sure that the mapping is always successful. Therefor I wanted to reserve a block of memory at program start only for the mapping and that would therefor suffer much less from the fragmentation.
My first approach was to HeapCreate a private heap, I would then HeapAlloc a block of memory large enough to hold the mapping for one file and then use MapViewOfFileEx with the address of that block. Of cause the address would have to match the memory allocation granularity. But the mapping still failed with error code ERROR_INVALID_ADDRESS (487).
Next I tried the same thing with VirtualAloc. My understanding was that when I pass the parameter MEM_RESERVE I would then be able to use that memory for what ever I wanted, e.g. to map a view of a file. But I found out that that is not possible (same error code as above) until i completely free the whole block with VirtualFree again. Therefor there would be no reserved memory for the next files anymore.
I'm already using the low fragmentation heap feature and it is of nearly no use to us. Rewriting our code to use only smaller views of the files is not an option at the moment. I also took a look at this post Can address space be recycled for multiple calls to MapViewOfFileEx without chance of failure? but didn't find any it very useful and was hoping for an other possibility.
Do you have any suggestions what I can do or where my design may be wrong?
Thank you.
Well, the documentation for MapViewOfFileEx is clear: "The suggested address is used to specify that a file should be mapped at the same address in multiple processes. This requires the region of address space to be available in all involved processes. No other memory allocation can take place in the region that is used for mapping, including the use of the VirtualAlloc"
The low fragmentation heap is intended to prevent even relatively small allocations from failing. I.e. it avoids 1 byte holes so 2 byte allocations will remain possible for longer. Your allocations are not small by 32 bits standards.
Realistically, this is going to hurt. If you really really need it, reimplement memory mapped files. All the necessary functions are available. Use a vectored exception handler to page in the source, and use QueryWorkingSet to figure out if pages are dirty.

What kind of book keeping does the OS do when we use new to allocate memory?

Besides remembering the address of the pointer of the object, I think the OS also need to record how large the size of the memory is. So that when we use delete, the os will know how much memory to free.
Can anyone tell me more details about this? What other information are recorded? And where are those information stored? What does the OS do after you delete the memory?
As already noted, new is a library function, not an OS feature.
The general case is approximately like this:
The C++ compiler translates the new keyword into function calls to malloc() (or equivalent)
The allocator keeps a list of free blocks of memory, it searches there for the best match.
Typically, the 'best' match is bigger than the amount asked by your program. if so, the allocator splits the block, marks one with the size (and maybe a few other metadata), puts the rest back into the free list, and returns the allocated block to the your program.
If no appropriate free block is found, the allocator asks for some chunk of memory from the OS. There are several ways to do it, but it's typically considered a slow operation, so it asks in bigger steps (at least one page at a time, usually 4KB). When it gets the new free block, splits into the requested size and the rest is put in the free list.
The OS is the one controlling the MMU (Memory Management Unit) of the processor. This unit is the one that translates the linear addresses as seen by the currently running process into the physical addresses of RAM pages. This allows the OS the flexibility it needs to allocate and deallocate RAM pages to each process.
Each process has a different memory map, that allows each one to 'see' a linear memory space while at the same time keeping each process isolated from the others. The OS is the one who loads and unloads the map into the MMU at each process switch. Allocating a new page to a process ultimately means adding it to the memory map of the process.
can anyone tell me more detail about this?
These are all details that are highly dependent on the OS and compiler and I can only provide fairly broad answers (but you really shouldn't have to worry about this, unless you are thinking of going into that line of work, of course).
what other information are recorded?
Generally, freestore memory is commonly referred to as the "heap". This is basically because it is implemented as a heap, which is a priority-queue implementation that builds a tree of nodes (blocks of free memory). The actual implementation and the algorithms used are very complex because they need to perform very well (i.e. free blocks need to be found very quickly when requesting new memory, and newly freed memory has to be merged to the heap very quickly as well) and then, there are all the fragmentation issues and so on. Look at the wiki on Buddy memory allocation for a very simplified version.
So, there is quite a bit more than just the list of allocated pointers and their corresponding memory-block sizes. In fact, the list of available memory blocks is far more important.
and where are those information stored?
The Heap is a part of the running program (in fact, each process, and even each loaded module, will have one or more such structures). The heap is not residing in the OS (and I'm not even sure it needs to even be given by the OS, at least not for all OSes). However, the heap obviously asks the OS to provide it with large blocks of memory that it can incorporate into its tree of free memory-blocks, whenever capacity is exhausted by the memory requests of your program.
what does the OS do after you delete the memory?
Generally, when you delete memory, it simply gets added to the list of free (or available) memory blocks, and possibly gets merged with adjacent free memory-blocks if present. There is no "filling the memory with zeros" or any other actual operation on the memory that gets done, that is unnecessary and a waste of processing time.