Find which heap an address belongs to? - c++

I'm creating a memory management system and i need a way to find in which heap an allocation I make is.
for example i use HeapAlloc and use the heap returned by GetProcessHeap() as the heap to allocate to I would expect it to allocate to that heap, but appears as though it doesn't.
When I use GetProcessHeaps to run through the heaps i find that the process heap is at something like 0x00670000 and my allocated address is at like 0x0243a385 or something. (in other words nowhere near it)
And sometimes it can actually be before it (so like 0x004335ab or something)
So, i'd like to know if there is a way I can reliably get the starting address of the heap (and the end address if at all possible!?) that i made the allocation in.

Your understanding of heaps is wrong. In general, modern heaps do not rely on allocating a large chunk of data and then parcelling it up with each allocation as you assume (although they may use this as one of their strategies). This means there is no well defined 'start' or 'end' of a heap. As an example, by default, with Windows heaps large allocations always go direct to the operating system via VirtualAlloc(...) which means that allocations from one heap may interleave with allocations from another.
If you really need to work out which heap an allocation comes from, there is a way, although its really slow so you shouldn't rely on it except for debugging or logging or similar. For actual, normal, code you should really know where allocations came from either via deduced context or by actually storing it.
Warnings aside, you can use HeapWalk to enumerate all allocations from each heap looking for the one you want.

Related

Performance and security in C++ when avoiding use of pointer

I'm trying to create a class in C++ with an idea of absolute encapsulation and efficiency for the sake of practice. In my case this means every data member is supposed to be inside the class with no pointers pointing outside (e.g. to dynamically allocated storage).
For example, I'm using
char name [10];
instead of
std::string name;
char* name;
My idea is that objects of the class are created as completely enclosed blocks on the stack. As well as that performance is increased, since, if I remember correctly, access to the stack is considerably faster than to the heap.
Am I correct in those assumptions?
And is this idea of absolute encapsulation sensible outside practice? (For example to ensure safety, since there seems to be no risk of memory mismanagement or buffer overflow)
access to the stack is considerably faster than to the heap
This is false: an access to memory is an access to memory. Two things might have confused you here.
First, it is true that different types of memory can be accessed at different speeds. For example, the disk is usually the slowest (without talking about networking, which complicates things even further), while registers are usually the fastest. In between is the main memory, or RAM, where both the stack and the heap live. And then you can have caches, different types of disks, and so on.
Second, stack allocation is indeed faster than heap allocation, just because the allocation scheme is simpler. With the stack, as the name implies, you can only allocate and deallocate at the end, meaning you need to follow a specific order. With the heap, you can allocate pretty much anywhere, meaning that you can deallocate at any point and in any order. This implies some kind of management of the memory that comes with its own problems, for example fragmentation.
is this idea of absolute encapsulation sensible outside practice?
First of all, only using the stack is impossible in practice simply because of its limited size. While this size can vary in practice, it's unlikely to be more than 8MB currently. As soon as you need to load a file larger than that, you cannot do it on the stack.
However, even if stack size was practically unlimited, you still need to deallocate things in the reverse order that you allocated them, otherwise it no longer is a stack. Many things are infeasible that way. For example, as soon as you want interactivity, you need some sort of event processing (to respond to user input), and this is usually done with a queue, which is like the opposite of a stack. Sure you could allocate an insanely large queue, but that's infeasible in practice. Another example that comes to mind is networking. If you want to deal with multiple connections at once (like a web browser for example), you need to deal with the memory associated to each one independantly. Again, you could allocate an insane amount of memory to each connection, but again, that's infeasible in practice.
Also, note that encapsulation does not mean "no pointers to dynamically allocated memory". Instead, "hidden memory management" would be closer to the meaning of this concept.

Is there any benefit to use multiple heaps for memory management purposes?

I am a student of a system software faculty. Now I'm developing a memory manager for Windows. Here's my simple implementation of malloc() and free():
HANDLE heap = HeapCreate(0, 0, 0);
void* hmalloc(size_t size)
{
return HeapAlloc(heap, 0, size);
}
void hfree(void* memory)
{
HeapFree(heap, 0, memory);
}
int main()
{
int* ptr1 = (int*)hmalloc(100*sizeof(int));
int* ptr2 = (int*)hmalloc(100*sizeof(int));
int* ptr3 = (int*)hmalloc(100*sizeof(int));
hfree(ptr2);
hfree(ptr3);
hfree(ptr1);
return 0;
}
It works fine. But I can't understand is there a reason to use multiple heaps? Well, I can allocate memory in the heap and get the address to an allocated memory chunk. But here I use ONE heap. Is there a reason to use multiple heaps? Maybe for multi-threaded/multi-process applications? Please explain.
The main reason for using multiple heaps/custom allocators are for better memory control. Usually after lots of new/delete's the memory can get fragmented and loose performance for the application (also the app will consume more memory). Using the memory in a more controlled environment can reduce heap fragmentation.
Also another usage is for preventing memory leaks in the application, you could just free the entire heap you allocated and you don't need to bother with freeing all the object allocated there.
Another usage is for tightly allocated objects, if you have for example a list then you could allocate all the nodes in a smaller dedicated heap and the app will gain performance because there will be less cache misses when iterating the nodes.
Edit: memory management is however a hard topic and in some cases it is not done right. Andrei Alexandrescu had a talk at one point and he said that for some application replacing the custom allocator with the default one increased the performance of the application.
This is a good link that elaborates on why you may need multiple heap:
https://caligari.dartmouth.edu/doc/ibmcxx/en_US/doc/libref/concepts/cumemmng.htm
"Why Use Multiple Heaps?
Using a single runtime heap is fine for most programs. However, using multiple
heaps can be more efficient and can help you improve your program's performance
and reduce wasted memory for a number of reasons:
1- When you allocate from a single heap, you may end up with memory blocks on
different pages of memory. For example, you might have a linked list that
allocates memory each time you add a node to the list. If you allocate memory for
other data in between adding nodes, the memory blocks for the nodes could end up
on many different pages. To access the data in the list, the system may have to
swap many pages, which can significantly slow your program.
With multiple heaps, you can specify which heap you allocate from. For example,
you might create a heap specifically for the linked list. The list's memory blocks
and the data they contain would remain close together on fewer pages, reducing the
amount of swapping required.
2- In multithread applications, only one thread can access the heap at a time to
ensure memory is safely allocated and freed. For example, say thread 1 is
allocating memory, and thread 2 has a call to free. Thread 2 must wait until
thread 1 has finished its allocation before it can access the heap. Again, this
can slow down performance, especially if your program does a lot of memory
operations.
If you create a separate heap for each thread, you can allocate from them
concurrently, eliminating both the waiting period and the overhead required to
serialize access to the heap.
3- With a single heap, you must explicitly free each block that you allocate. If you
have a linked list that allocates memory for each node, you have to traverse the
entire list and free each block individually, which can take some time.
If you create a separate heap for that linked list, you can destroy it with a
single call and free all the memory at once.
4- When you have only one heap, all components share it (including the IBM C and
C++ Compilers runtime library, vendor libraries, and your own code). If one
component corrupts the heap, another component might fail. You may have trouble
discovering the cause of the problem and where the heap was damaged.
With multiple heaps, you can create a separate heap for each component, so if
one damages the heap (for example, by using a freed pointer), the others can
continue unaffected. You also know where to look to correct the problem."
A reason would be the scenario that you need to execute a program internally e.g. running simulation code. By creating your own heap you could allow that heap to have execution rights which by default for security reasons is turned off. (Windows)
You have some good thoughts and this'd work for C but in C++ you have destructors, it is VERY important they run.
You can think of all types as having constructors/destructors, just that logically "do nothing".
This is about allocators. See "The buddy algorithm" which uses powers of two to align and re-use stuff.
If I allocate 4 bytes somewhere, my allocator might allocate a 4kb section just for 4 byte allocations. That way I can fit 1024 4 byte things in the block, if I need more add another block and so forth.
Ask it for 4kb and it wont allocate that in the 4byte block, it might have a separate one for larger requests.
This means you can keep big things together. If I go 17 bytes then 13 bytes the 1 byte and the 13byte gets freed, I can only stick something in there of <=13 bytes.
Hence the buddy system and powers of 2, easy to do using lshifts, if I want a 2.5kb block, I allocate it as the smallest power of 2 that'll fit (4kb in this case) that way I can use the slot afterwards for <=4kb items.
This is not for garbage collection, this is just keeping things more compact and neat, using your own allocator can stop calls to the OS (depending on the default implementation of new and delete they might already do this for your compiler) and make new/delete very quick.
Heap-compacting is very different, you need a list of every pointer that points to your heap, or some way to traverse the entire memory graph (like spits Java) so when you move stuff round and "compact" it you can update everything that pointed to that thing to where it currently is.
The only time I ever used more than one heap was when I wrote a program that would build a complicated data structure. It would have been non-trivial to free the data structure by walking through it and freeing the individual nodes, but luckily for me the program only needed the data structure temporarily (while it performed a particular operation), so I used a separate heap for the data structure so that when I no longer needed it, I could free it with one call to HeapDestroy.

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA
Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).
There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703
This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.
There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

C++ When to allocate on heap vs stack?

Whilst asking another question (and also before) I was wondering how do I judge whether to create an object on the heap or keep it as an object on the stack? What should I ask myself about the object to make the correct allocation?
Put it on the heap if you have to, the stack if you can.
What kinds of things do you need to put on the heap? Anything of varying length. Any object that might need to be null. Anything that's very large, lest you cause a stack overflow.
Simple answer.
When it goes out of scope, do you want it to hang around and be able to use it?
Depends on intended lifetime of the object.
If you want the object to be alive even after function returns, then HEAP, else STACK
If an object is placed in the HEAP, then it must be explicitly free()'ed or deleted by the programmer, once its usage is over; otherwise the program will be leaking memory.
Stack memory is fast. It is fast because (a) there is no system overhead to allocate the memory - the allocation is done by simply moving the stack pointer in one instruction and (b) the memory in the stack is "hot" so it is already in cache. Heap memory is slow because (a) it requires a lot of system work to look around and find a free chunk of memory and (b) is probably not in cache and will require evicting some data you might have wanted.
Stack memory doesn't get fragmented. It is possible that a heap eventually gets so fragmented, you can't allocate anything (even though ironically there is still enough unused memory!)
For long lived data and for large data (multi KB or more), you have to use a heap.
The danger of allocating a bigger stack is that it might hurt you if are running multiple threads. You have to size the stack for the "worst case" usage. Each thread requires its own stack. On a high core count machine (where you might have 200+ threads running), you may not want to arbitrarily increase the stack. The heap on the other hand does not need to be sized for "worst case" usage - it is much more efficient.
Two reasons to use the heap:
1- You want the data after the current scope.
2- You want to reserve large memory.
Other than that stay on stack.
Note: don't reserve a lot of memory on the stack, or you'll get a "Stack-overflow" ;)

C++ Stack Walking on Windows

I'm building a memory manager for C++ using a very .NET style approach. In doing so I need to know which objects are considered reachable; and object is considered reachable if a reachable object has a handle to the object in question. So this poses the question of which object(s) are the root of our search? The answer would be that these "eve" objects are on the stack, be it in the form of a handle to a managed object or an instance of a scope-local object that itself has a handle to a managed object.
I've read through some articles on this and also checked out implementation details on the MSDN about the StackWalk method in the Win32 API.
As always any help is greatly appreciated. And please don't advise against making a memory manager, or suggest alternatives such as smart pointers. I fully understand what I am doing. Thanks!
Your requirements sort of seem similar to a small project I’m working on at the moment, but my goal isn’t to make a memory manager, my goal is to instrument dmalloc (and the debug-mode long-running application within which it is running) with the ability to periodically halt execution and scan memory looking for heap allocations for which there are no references. Sort of like a “dumb” garbage collector, but not with the goal of freeing memory; instead, with the goal of logging leaked allocations for later analysis (along with stacktraces captured at allocation-time, which I’ve already added to dmalloc). Note that as a general-purpose memory manager’s garbage collector, this will be a pretty inefficient process and will take a “long” time to run (I’m not done yet, but I won’t be surprised if each time it runs it halts normal program execution for over 10 seconds), but for my own purposes I don’t care too much about performance because I’ll enable it only once every few months to test for new memory leaks in my company’s product.
In any case, I assume your memory manager will be the only source of heap memory in your application? And that threads in your system operate in a fully shared-memory environment, where no thread has any memory, including stack space and thread-local storage space, that cannot be seen from other threads? If so...
I believe there are just four categories of memory within which you may find pointers to heap allocations:
On the callstacks of each thread
Within heap allocations themselves
In statically allocated writable memory (.bss & .data/.sdata, but
not .rdata/.rodata)
In thread-local storage space for each thread
You are already aware that pointers to heap allocations may occur on the stack. Pointers to allocations may also (may instead) be stored in heap objects themselves, and not even stored on the stack. Your question suggests you may be hoping to use the stack as a “root” of your garbage collector’s search; I’m taking this to mean you hope to be able to follow pointers on the stack outwards to other allocations, searching from one object to another through memory until you’ve traversed all objects in memory and found all pointers to all allocations. "Root" pointers may also exist in statically allocated objects, which can be referenced directly without there even being a pointer to such an object on the stack, so you can't just assume all allocations are reachable from "pointers" you find in the stack. Also, unfortunately with C++, unless you’re able to know the structure of each allocation (which you won’t without help from the compiler), you’ll have to assume that any location is possibly a pointer. So you’ll have to scan through each of these four categories of memory looking for potential pointers to all existing allocations, flagging each with a “possibly still in use” flag if you find a value in memory that matches the address of an allocation, whether or not it’s actually a pointer. As you scan through memory, at each byte location (or at each byte location evenly divisible by sizeof(void*), if you know your platform can’t have pointers at misaligned addresses), you’ll have to search your list of allocations to see if that value is in your list of allocations.
Since you're confident that you know what you’re doing, your memory manager is probably tracking these allocations in a balanced tree structure (perhaps a red-black tree or Andersson tree) which gives you O(log n) insertion & lookup on those allocations, but the constant of proportionality for navigating those trees is going to really kill your garbage collector’s performance. Before doing your garbage collection scan, you’ll want to copy the tree’s allocation pointers into a flat contiguous buffer (i.e. an “array”) in order (i.e. ascending or descending using inorder traversal). I suggest an array of void* of each allocation’s address and a separate bit-array (not bool array) with one bit per allocation, initialized to all-zeros, where an allocation’s corresponding bit is set to 1 if you find a potential reference to it. This will still give you O(log n) lookup (using binary search) while you’re scanning for garbage collection, but with a much more manageable constant of proportionality for your lookups; in addition, this more compact data structure will tend to have better cache hit performance than a balanced tree.
Now I’ll discuss each of the three categories of memory you’d have to scan:
The callstacks of each thread
For this, you’ll have to be able to query your thread manager for the top & bottom of each thread’s stacks. If you can only get the current stack pointer for each thread, then you may be able to use a “backtrace” API to get a list of function return addresses on that stack. From that, you can scan back toward each stack’s base (which you don’t know), ticking off each return address in order until you get to the last return address, where you’ve then found the stack base (or close enough). And for the “current thread”, be sure to not include any stackframes associated with your memory manager; i.e., back up a few stackframes & ignore the ones associated with your garbage collector, or else you might find addresses of leaked allocations in your garbage collector’s local variables and mistake them for
Within heap allocations themselves
Heap objects can reference each other, and you could have a network of leaked objects that all reference each other yet as a group, they are leaked. You don't want to see their pointers to each other & treat them as "in-use", so you have to handle these carefully... and last. Once all other categories are finished, you can collapse/split your flat array of void* allocation addresses, making a separate list of "considered in-use" allocations and "not yet verified" allocations. Scan through the "considered in-use" allocations looking for potential pointers to allocations still in the "not yet verified" list. As you find any, move them from the "not yet verified" list to the end of the "considered in-use" list so that you'll eventually scan those as well.
In statically allocated writable memory (.bss & .data/.sdata, but not
.rdata/.rodata)
For this, you’ll need to get symbols from your linker to the start & end (or length) of each of these sections. If such symbols don’t already exist or you can’t get that information from a platform API, you’ll need to get your linker command script (linker script) and modify it to add & initialize global symbols to the start address & end address (or length) of each of these sections. The .bss section contains uninitialized global, file scope, and class static data members. The .data/.sdata section(s) contain non-const pre-initialized global, file scope, and class static data members. You don’t need to worry about the .rdata/.rodata section(s) because your program won’t be writing heap-allocation addresses into static const data.
In thread-local storage space for each thread
For this, you’ll have to be able to query your thread manager for the thread-local storage space for each thread, or else part of the startup of each thread must be to add its thread-local storage to a list of thread-local space for the application, and remove it when the thread exits.
If you’re still on board and want to do this, by now you’ve probably realized it’s a bigger project than you may have initially thought. Let me know how it goes!