How does a computer 'know' what memory is allocated? - c++

When memory is allocated in a computer, how does it know which bytes are already occupied and can't be overwritten?
So if these are some bytes of memory that aren't being used:
[0|0|0|0]
How does the computer know whether they are or not? They could just be an integer that equals zero. Or it could be empty memory. How does it know?

That depends on the way the allocation is performed, but it generally involves manipulation of data belonging to the allocation mechanism.
When you allocate some variable in a function, the allocation is performed by decrementing the stack pointer. Via the stack pointer, your program knows that anything below the stack pointer is not allocated to the stack, while anything above the stack pointer is allocated.
When you allocate something via malloc() etc. on the heap, things are similar, but more complicated: all theses allocators have some internal data structures which they never expose to the calling application, but which allow them to select which memory addresses to return on an allocation request. Some malloc() implementation, for instance, use a number of memory pools for small objects of fixed size, and maintain linked lists of free objects for each fixed size which they track. That way, they can quickly pop one memory region of that list, only doing more expensive computations when they run out of regions to satisfy a certain request size.
In any case, each of the allocators have to request memory from the system kernel from time to time. This mechanism always works on complete memory pages (usually 4 kiB), and works via the syscalls brk() and mmap(). Again, the kernel keeps track of which pages are visible in which processes, and at which addresses they are mapped, so there is additional memory allocated inside the kernel for this.
These mappings are made available to the processor via the page tables, which uses them to resolve the virtual memory addresses to the physical addresses. So here, finally, you have some hardware involved in the process, but that is really far, far down in the guts of the mechanics, much below anything that a userspace process is ever able to see. Still, even the page tables are managed by the software of the kernel, not by the hardware, the hardware only interpretes what the software writes into the page tables.

First of all, I have the impression that you believe that there is some unoccupied memory that doesn't holds any value. That's wrong. You can imagine the memory as a very large array when each box contains a value whereas someone put something in it or not. If a memory was never written, then it contains a random value.
Now to answer your question, it's not the computer (meaning the hardware) but the operating system. It holds somewhere in its memory some tables recording which part of the memory are used. Also any byte of memory can be overwriten.

In general, you cannot tell by looking at content of memory at some location whether that portion of memory is used or not. Memory value '0' does not mean the memory is not used.
To tell what portions of memory are used you need some structure to tell you this. For example, you can divide memory into chunks and keep track of which chunks are used and which are not.

There are memory blocks, they have an occupied or not occupied. On the heap, there are very complex data structures which organise it. But the answer to your question is too broad.

Related

Sequential memory allocation

I'm planning a application that allocates a lot of variables in memory. In difference from another "regular" application, I want this memory be allocated in specific memory blocks of 4096 bytes. My allocated vars must be placed in memory sequentially. One after another, in order to fill the whole allocated memory.
For example, I'm allocating a region (4096 bytes) in memory and this region is ready for my further use. From now, each time that my application creates a new variable in memory (which is probably made in "regular" application with malloc), this variable will be placed in free space in my memory region.
This sequential memory allocation is similare to how an array allocation works. But, in my case, I need an array that will be able to contain many types of data (string, byte, int, ...).
One possible solution is to achieve this is by pointer arithmetics. I want to avoid this method, this may insert a lot of bugs in my application.
Maybe someone solved this problem before?
Thank you!
malloc() by no means guarantees that subsequent allocated blocks are on sequential memory address. Even worse, most implementations use a small number of bytes before and/or after the allocated block for 'housekeeping'. This means that, even if you're lucky that addresses are sequential, there will be small gaps in between the blocks. So the actual allocated blocks are slightly bigger to make space for those 'housekeeping' bytes.
As you suggest, you'll need to write some code yourself and write a few functions with malloc(), realloc(), ... You can hide all the logic in these functions and should not make your application code using these functions more complex compared to using malloc() if it did what you wanted.
Important questions: Why do you need to have these blocks adjacent to each other? What about freeing blocks?

Why can't we declare an array, say of int data type, of any size within the memory limit?

int A[10000000]; //This gives a segmentation fault
int *A = (int*)malloc(10000000*sizeof(int));//goes without any set fault.
Now my question is, just out of curiosity, that if ultimately we are able to allocate higher space for our data structures, say for example, BSTs and linked lists created using the pointers approach in C have no as such memory limit(unless the total size exceeds the size of RAM for our machine) and for example, in the second statement above of declaring a pointer type, why is that we can't have an array declared of higher size(until it reaches the memory limit!!)...Is this because the space allocated is contiguous in a static sized array?.But then from where do we get the guarantee that in the next 1000000 words in RAM no other piece of code would be running...??
PS: I may be wrong in some of the statements i made..please correct in that case.
Firstly, in a typical modern OS with virtual memory (Linux, Windows etc.) the amount of RAM makes no difference whatsoever. Your program is working with virtual memory, not with RAM. RAM is just a cache for virtual memory access. The absolute limiting factor for maximum array size is not RAM, it is the size of the available address space. Address space is the resource you have to worry about in OSes with virtual memory. In 32-bit OSes you have 4 gigabytes of address space, part of which is taken up for various household needs and the rest is available to you. In 64-bit OSes you theoretically have 16 exabytes of address space (less than that in practical implementations, since CPUs usually use less than 64 bits to represent the address), which can be perceived as practically unlimited.
Secondly, the amount of available address space in a typical C/C++ implementation depends on the memory type. There's static memory, there's automatic memory, there's dynamic memory. The address space limits for each memory type are pre-set in advance by the compiler. Which raises the question: where are you declaring your large array? Which memory type? Automatic? Static? You provided no information, but this is absolutely necessary. If you are attempting to declare it as a local variable (automatic memory), then no wonder it doesn't work, since automatic memory (aka "stack memory") has very limited address space assigned to it. Your array simply does not fit. Meanwhile, malloc allocates dynamic memory, which normally has the largest amount of address space available.
Thirdly, many compilers provide you with options that control the initial distribution of address space between different kinds of memory. You can request a much larger stack size for your program by manipulating such options. Quite possibly you can request a stack so large, than your local array will fit in it without any problems. But in practice, for obvious reasons, it makes very little sense to declare huge arrays as local variables.
Assuming local variables, this is because on modern implementations automatic variables will be allocated on the stack which is very limited in space. This link gives some of the common stack sizes:
platform default size
=====================================
SunOS/Solaris 8172K bytes
Linux 8172K bytes
Windows 1024K bytes
cygwin 2048K bytes
The linked article also notes that the stack size can be changed for example in Linux, one possible way from the shell before running your process would be:
ulimit -s 32768 # sets the stack size to 32M bytes
While malloc on modern implementations will come from the heap, which is only limited to the memory you have available to the process and in many cases you can even allocate more than is available due to overcommit.
I THINK you're missing the difference between total memory, and your programs memory space. Your program runs in an environment created by your operating system. It grants it a specific memory range to the program, and the program has to try to deal with that.
The catch: Your compiler can't 100% know the size of this range.
That means your compiler will successfully build, and it will REQUEST that much room in memory when the time comes to make the call to malloc (or move the stack pointer when the function is called). When the function is called (creating a stack frame) you'll get a segmentation fault, caused by the stack overflow. When the malloc is called, you won't get a segfault unless you try USING the memory. (If you look at the manpage for malloc() you'll see it returns NULL when there's not enough memory.)
To explain the two failures, your program is granted two memory spaces. The stack, and the heap. Memory allocated using malloc() is done using a system call, and is created on the heap of your program. This dynamically accepts or rejects the request and returns either the start address, or NULL, depending on a success or fail. The stack is used when you call a new function. Room for all the local variables is made on the stack, this is done by program instructions. Calling a function can't just FAIL, as that would break program flow completely. That causes the system to say "You're now overstepping" and segfault, stopping the execution.

Is there any benefit to use multiple heaps for memory management purposes?

I am a student of a system software faculty. Now I'm developing a memory manager for Windows. Here's my simple implementation of malloc() and free():
HANDLE heap = HeapCreate(0, 0, 0);
void* hmalloc(size_t size)
{
return HeapAlloc(heap, 0, size);
}
void hfree(void* memory)
{
HeapFree(heap, 0, memory);
}
int main()
{
int* ptr1 = (int*)hmalloc(100*sizeof(int));
int* ptr2 = (int*)hmalloc(100*sizeof(int));
int* ptr3 = (int*)hmalloc(100*sizeof(int));
hfree(ptr2);
hfree(ptr3);
hfree(ptr1);
return 0;
}
It works fine. But I can't understand is there a reason to use multiple heaps? Well, I can allocate memory in the heap and get the address to an allocated memory chunk. But here I use ONE heap. Is there a reason to use multiple heaps? Maybe for multi-threaded/multi-process applications? Please explain.
The main reason for using multiple heaps/custom allocators are for better memory control. Usually after lots of new/delete's the memory can get fragmented and loose performance for the application (also the app will consume more memory). Using the memory in a more controlled environment can reduce heap fragmentation.
Also another usage is for preventing memory leaks in the application, you could just free the entire heap you allocated and you don't need to bother with freeing all the object allocated there.
Another usage is for tightly allocated objects, if you have for example a list then you could allocate all the nodes in a smaller dedicated heap and the app will gain performance because there will be less cache misses when iterating the nodes.
Edit: memory management is however a hard topic and in some cases it is not done right. Andrei Alexandrescu had a talk at one point and he said that for some application replacing the custom allocator with the default one increased the performance of the application.
This is a good link that elaborates on why you may need multiple heap:
https://caligari.dartmouth.edu/doc/ibmcxx/en_US/doc/libref/concepts/cumemmng.htm
"Why Use Multiple Heaps?
Using a single runtime heap is fine for most programs. However, using multiple
heaps can be more efficient and can help you improve your program's performance
and reduce wasted memory for a number of reasons:
1- When you allocate from a single heap, you may end up with memory blocks on
different pages of memory. For example, you might have a linked list that
allocates memory each time you add a node to the list. If you allocate memory for
other data in between adding nodes, the memory blocks for the nodes could end up
on many different pages. To access the data in the list, the system may have to
swap many pages, which can significantly slow your program.
With multiple heaps, you can specify which heap you allocate from. For example,
you might create a heap specifically for the linked list. The list's memory blocks
and the data they contain would remain close together on fewer pages, reducing the
amount of swapping required.
2- In multithread applications, only one thread can access the heap at a time to
ensure memory is safely allocated and freed. For example, say thread 1 is
allocating memory, and thread 2 has a call to free. Thread 2 must wait until
thread 1 has finished its allocation before it can access the heap. Again, this
can slow down performance, especially if your program does a lot of memory
operations.
If you create a separate heap for each thread, you can allocate from them
concurrently, eliminating both the waiting period and the overhead required to
serialize access to the heap.
3- With a single heap, you must explicitly free each block that you allocate. If you
have a linked list that allocates memory for each node, you have to traverse the
entire list and free each block individually, which can take some time.
If you create a separate heap for that linked list, you can destroy it with a
single call and free all the memory at once.
4- When you have only one heap, all components share it (including the IBM C and
C++ Compilers runtime library, vendor libraries, and your own code). If one
component corrupts the heap, another component might fail. You may have trouble
discovering the cause of the problem and where the heap was damaged.
With multiple heaps, you can create a separate heap for each component, so if
one damages the heap (for example, by using a freed pointer), the others can
continue unaffected. You also know where to look to correct the problem."
A reason would be the scenario that you need to execute a program internally e.g. running simulation code. By creating your own heap you could allow that heap to have execution rights which by default for security reasons is turned off. (Windows)
You have some good thoughts and this'd work for C but in C++ you have destructors, it is VERY important they run.
You can think of all types as having constructors/destructors, just that logically "do nothing".
This is about allocators. See "The buddy algorithm" which uses powers of two to align and re-use stuff.
If I allocate 4 bytes somewhere, my allocator might allocate a 4kb section just for 4 byte allocations. That way I can fit 1024 4 byte things in the block, if I need more add another block and so forth.
Ask it for 4kb and it wont allocate that in the 4byte block, it might have a separate one for larger requests.
This means you can keep big things together. If I go 17 bytes then 13 bytes the 1 byte and the 13byte gets freed, I can only stick something in there of <=13 bytes.
Hence the buddy system and powers of 2, easy to do using lshifts, if I want a 2.5kb block, I allocate it as the smallest power of 2 that'll fit (4kb in this case) that way I can use the slot afterwards for <=4kb items.
This is not for garbage collection, this is just keeping things more compact and neat, using your own allocator can stop calls to the OS (depending on the default implementation of new and delete they might already do this for your compiler) and make new/delete very quick.
Heap-compacting is very different, you need a list of every pointer that points to your heap, or some way to traverse the entire memory graph (like spits Java) so when you move stuff round and "compact" it you can update everything that pointed to that thing to where it currently is.
The only time I ever used more than one heap was when I wrote a program that would build a complicated data structure. It would have been non-trivial to free the data structure by walking through it and freeing the individual nodes, but luckily for me the program only needed the data structure temporarily (while it performed a particular operation), so I used a separate heap for the data structure so that when I no longer needed it, I could free it with one call to HeapDestroy.

C++ Stack Walking on Windows

I'm building a memory manager for C++ using a very .NET style approach. In doing so I need to know which objects are considered reachable; and object is considered reachable if a reachable object has a handle to the object in question. So this poses the question of which object(s) are the root of our search? The answer would be that these "eve" objects are on the stack, be it in the form of a handle to a managed object or an instance of a scope-local object that itself has a handle to a managed object.
I've read through some articles on this and also checked out implementation details on the MSDN about the StackWalk method in the Win32 API.
As always any help is greatly appreciated. And please don't advise against making a memory manager, or suggest alternatives such as smart pointers. I fully understand what I am doing. Thanks!
Your requirements sort of seem similar to a small project I’m working on at the moment, but my goal isn’t to make a memory manager, my goal is to instrument dmalloc (and the debug-mode long-running application within which it is running) with the ability to periodically halt execution and scan memory looking for heap allocations for which there are no references. Sort of like a “dumb” garbage collector, but not with the goal of freeing memory; instead, with the goal of logging leaked allocations for later analysis (along with stacktraces captured at allocation-time, which I’ve already added to dmalloc). Note that as a general-purpose memory manager’s garbage collector, this will be a pretty inefficient process and will take a “long” time to run (I’m not done yet, but I won’t be surprised if each time it runs it halts normal program execution for over 10 seconds), but for my own purposes I don’t care too much about performance because I’ll enable it only once every few months to test for new memory leaks in my company’s product.
In any case, I assume your memory manager will be the only source of heap memory in your application? And that threads in your system operate in a fully shared-memory environment, where no thread has any memory, including stack space and thread-local storage space, that cannot be seen from other threads? If so...
I believe there are just four categories of memory within which you may find pointers to heap allocations:
On the callstacks of each thread
Within heap allocations themselves
In statically allocated writable memory (.bss & .data/.sdata, but
not .rdata/.rodata)
In thread-local storage space for each thread
You are already aware that pointers to heap allocations may occur on the stack. Pointers to allocations may also (may instead) be stored in heap objects themselves, and not even stored on the stack. Your question suggests you may be hoping to use the stack as a “root” of your garbage collector’s search; I’m taking this to mean you hope to be able to follow pointers on the stack outwards to other allocations, searching from one object to another through memory until you’ve traversed all objects in memory and found all pointers to all allocations. "Root" pointers may also exist in statically allocated objects, which can be referenced directly without there even being a pointer to such an object on the stack, so you can't just assume all allocations are reachable from "pointers" you find in the stack. Also, unfortunately with C++, unless you’re able to know the structure of each allocation (which you won’t without help from the compiler), you’ll have to assume that any location is possibly a pointer. So you’ll have to scan through each of these four categories of memory looking for potential pointers to all existing allocations, flagging each with a “possibly still in use” flag if you find a value in memory that matches the address of an allocation, whether or not it’s actually a pointer. As you scan through memory, at each byte location (or at each byte location evenly divisible by sizeof(void*), if you know your platform can’t have pointers at misaligned addresses), you’ll have to search your list of allocations to see if that value is in your list of allocations.
Since you're confident that you know what you’re doing, your memory manager is probably tracking these allocations in a balanced tree structure (perhaps a red-black tree or Andersson tree) which gives you O(log n) insertion & lookup on those allocations, but the constant of proportionality for navigating those trees is going to really kill your garbage collector’s performance. Before doing your garbage collection scan, you’ll want to copy the tree’s allocation pointers into a flat contiguous buffer (i.e. an “array”) in order (i.e. ascending or descending using inorder traversal). I suggest an array of void* of each allocation’s address and a separate bit-array (not bool array) with one bit per allocation, initialized to all-zeros, where an allocation’s corresponding bit is set to 1 if you find a potential reference to it. This will still give you O(log n) lookup (using binary search) while you’re scanning for garbage collection, but with a much more manageable constant of proportionality for your lookups; in addition, this more compact data structure will tend to have better cache hit performance than a balanced tree.
Now I’ll discuss each of the three categories of memory you’d have to scan:
The callstacks of each thread
For this, you’ll have to be able to query your thread manager for the top & bottom of each thread’s stacks. If you can only get the current stack pointer for each thread, then you may be able to use a “backtrace” API to get a list of function return addresses on that stack. From that, you can scan back toward each stack’s base (which you don’t know), ticking off each return address in order until you get to the last return address, where you’ve then found the stack base (or close enough). And for the “current thread”, be sure to not include any stackframes associated with your memory manager; i.e., back up a few stackframes & ignore the ones associated with your garbage collector, or else you might find addresses of leaked allocations in your garbage collector’s local variables and mistake them for
Within heap allocations themselves
Heap objects can reference each other, and you could have a network of leaked objects that all reference each other yet as a group, they are leaked. You don't want to see their pointers to each other & treat them as "in-use", so you have to handle these carefully... and last. Once all other categories are finished, you can collapse/split your flat array of void* allocation addresses, making a separate list of "considered in-use" allocations and "not yet verified" allocations. Scan through the "considered in-use" allocations looking for potential pointers to allocations still in the "not yet verified" list. As you find any, move them from the "not yet verified" list to the end of the "considered in-use" list so that you'll eventually scan those as well.
In statically allocated writable memory (.bss & .data/.sdata, but not
.rdata/.rodata)
For this, you’ll need to get symbols from your linker to the start & end (or length) of each of these sections. If such symbols don’t already exist or you can’t get that information from a platform API, you’ll need to get your linker command script (linker script) and modify it to add & initialize global symbols to the start address & end address (or length) of each of these sections. The .bss section contains uninitialized global, file scope, and class static data members. The .data/.sdata section(s) contain non-const pre-initialized global, file scope, and class static data members. You don’t need to worry about the .rdata/.rodata section(s) because your program won’t be writing heap-allocation addresses into static const data.
In thread-local storage space for each thread
For this, you’ll have to be able to query your thread manager for the thread-local storage space for each thread, or else part of the startup of each thread must be to add its thread-local storage to a list of thread-local space for the application, and remove it when the thread exits.
If you’re still on board and want to do this, by now you’ve probably realized it’s a bigger project than you may have initially thought. Let me know how it goes!

C++ new / new[], how is it allocating memory?

I would like to now how those instructions are allocating memory.
For example what if I got code:
x = new int[5];
y = new int[5];
If those are allocated how it actually looks like in RAM?
Is whole block reserved for each of the variables or block(memory page or how-you-call-it - 4KB of size on 32bit) is shared for 2 variables?
I couldn't find answer for my question in any manual. Thanks for all replies.
I found on wikipedia:
Internal fragmentation of pages
Rarely do processes require the use of an exact number of pages. As a result, the last page will likely only be partially full, wasting some amount of memory. Larger page sizes clearly increase the potential for wasted memory this way, as more potentially unused portions of memory are loaded into main memory. Smaller page sizes ensure a closer match to the actual amount of memory required in an allocation.
As an example, assume the page size is 1024KB. If a process allocates 1025KB, two pages must be used, resulting in 1023KB of unused space (where one page fully consumes 1024KB and the other only 1KB).
And that was answer for my question. Anyway thanks guys.
A typical allocator implementation will first call the operating system to get huge block of memory, and then to satisfy your request it will give you a piece of that memory, this is known as suballocation. If it runs out of memory, it will get more from the operating system.
The allocator must keep track of both all the big blocks it got from the operating system and also all the small blocks it handed out to its clients. It also must accept blocks back from clients.
A typical suballocation algorithm keeps a list of returned blocks of each size called a freelist and always tries to satisfy a request from the freelist, only going to the main block if the freelist is empty. This particular implementation technique is extremely fast and quite efficient for average programs, though it has woeful fragmentation properties if request sizes are all over the place (which is not usual for most programs).
Modern allocators like GNU's malloc implementation are complex, but have been built with many decades of experience and should be considered so good that it is very rare to need to write your own specialised suballocator.
You didn't find it in the manual because it's not specified by the standard. That is, most of the time x and y will be side by side (go ahead and cout<< hex << their addresses).
But nothing in the standard forces this so you can't rely on it.
Each process has different segments associated which are divided among the process address space :
1) The text segment :: Where your code is placed
2) Stack Segment :: Process stack
3) Data Segment :: This is where memory by "new" is reserved. Besides that it also store initialized and uninitialized static data (bss etc).
So , whenever you call a new function (which i guess uses malloc internally , but the new class makes it much safer to handle memory) it allocates the specified number of bytes in the data segment.
Ofcourse the address you print while running the program is virtual and needs to be translated to physical address..but that is not our headache and the OS Memory management unit does that for us.