What is the Difference between reserve and commit argument to CreateThread? - c++

What is the difference between reserve argument and commit argument to CreateThread Windows API function?
I can't understand the following lines ..
The reserve argument sets the amount of address space the system should reserve for the thread's stack. The default is 1 MB.
The commit argument specifies the amount of physical storage that should be initially committed to the stack's reserved region.
these two lines you will find them in this paragraph which explains one of the parameters of the CreateThread function in c++
cbStackSize
The cbStackSize parameter specifies how much address space the thread can use for its own stack. Every thread owns its own
stack. When CreateProcess starts a process, it internally calls CreateThread to initialize the process' primary thread. For the
cbStackSize parameter, CreateProcess uses a value stored inside the executable file. You can control this value using the
linker's /STACK switch:
/STACK:[ reserve][, commit]
The reserveargument sets the amount of address space the system should reserve for the thread's stack. The default is 1 MB.
The commitargument specifies the amount of physical storage that should be initially committed to the stack's reserved region.

The distinction is distinction between virtual and physical memory.
In any operating system worthy of that name, including Windows, pointers don't designate locations on the memory chip directly. They are locations in process-specific virtual memory space and the operating system then allocates parts of the physical memory chip to store the content of the parts where the process actually stores anything on demand. And may swap some data to disk when it runs out of RAM.
The reserve is the size of continuous virtual memory block to allocate for the stack. Below and above the range other things will be stored, so the reserve puts upper limit on how big the stack can grow.
Fortunately virtual memory is usually plentiful. You have 2GiB on 32-bit Windows, 3GiB if you link with /LARGEADDRESSAWARE flag and huge amount if you compile for 64-bits (x64). Only exception is WinCE before 5.0, where you only have 32MiB. So unless you are creating zillions of threads, you can be generous here and you should be, because if you don't have enough, the process will crash.
The commit is the size of physical memory that the system should preallocate for the stack. This makes the system immediately get some space in the physical memory, which is a shared resource and may be scarce. It may need to swap or discard it's previous content to get it. When you exceed it, the system will automatically scramble some more at the cost of small delay. So the only thing you gain by increasing the value here is a little speed-up if you actually have need for the memory. It's slow down if you don't. So you should be conservative here.
The stack is where local variables are placed. If you use large local buffers—and it is often reasonable since stack allocation is much faster than heap allocation (via malloc/new/anything that uses std::allocator)—you need to reserve enough stack. If you don't, the 1MiB is usually plenty.

The reserve places a ceiling on how much stack space the thread will have. The commit places a floor on it. So it starts out consuming commit amount of memory and stops consuming when it hits reserve.

Every process has an address space. Stack of each thread is located somewhere in this space. When the tread is created, OS allocates piece of address space of the size reserve.
But it does not assign any real memory to all this space. It assigns only commit amount of memory.
The stack can grow over the time and OS will add more pages to it, extending the commit amount. But it cannot grow indefinitely. It cannot grow bigger than reserve.

Related

Can I create an array exceeding RAM, if I have enough swap memory?

Let's say I have 8 Gigabytes of RAM and 16 Gigabytes of swap memory. Can I allocate a 20 Gigabyte array there in C? If yes, how is it possible? What would that memory layout look like?
[linux] Can I create an array exceeding RAM, if I have enough swap memory?
Yes, you can. Note that accessing swap is veerry slooww.
how is it possible
Allocate dynamic memory. The operating system handles the rest.
How would that memory layout look like?
On an amd64 system, you can have 256 TiB of address space. You can easily fit a contiguous block of 8 GiB in that space. The operating system divides the virtual memory into pages and copies the pages between physical memory and swap space as needed.
Modern operating systems use virtual memory. In Linux and most other OSes rach process has it's own address space according to the abilities of the architecture. You can check the size of the virtual address space in /proc/cpuinfo. For example you may see:
address sizes : 43 bits physical, 48 bits virtual
This means that virtual addresses use 48 bit. Half of that is reserved for the kernel so you only can use 47 bit, or 128TiB. Any memory you allocate will be placed somewhere in those 128 TiB of address space as if you actually had that much memory.
Linux uses demand page loading and per default over commits memory. When you say
char *mem = (char*)malloc(1'000'000'000'000);
what happens is that Linux picks a suitable address and just records that you have allocated 1'000'000'000'000 (rounded up to the nearest page) of memory starting at that point. (It does some sanity check that the amount isn't totally bonkers depending on the amount of physical memory that is free, the amount of swap that is free and the overcommit setting. Per default you can allocate a lot more than you have memory and swap.)
Note that at this point no physical memory and no swap space is connected to your allocated block at all. This changes when you first write to the memory:
mem[4096] = 0;
At this point the program will page fault. Linux checks the address is actually something your program is allowed to write to, finds a physical page and map it to &mem[4096]. Then it lets the program retry to write there and everything continues.
If Linux can't find a physical page it will try to swap something out to make a physical page available for your programm. If that also fails your program will receive a SIGSEGV and likely die.
As a result you can allocate basically unlimited amounts of memory as long as you never write to more than the physical memory and swap and support. On the other hand if you initialize the memory (explicitly or implicitly using calloc()) the system will quickly notice if you try to use more than available.
You can, but not with a simple malloc. It's platform-dependent.
It requires an OS call to allocate swapable memory (it's VirtualAlloc on Windows, for example, on Linux it should be mmap and related functions).
Once it's done, the allocated memory is divided into pages, contiguous blocks of fixed size. You can lock a page, therefore it will be loaded in RAM and you can read and modify it freely. For old dinosaurs like me, it's exactly how EMS memory worked under DOS... You address your swappable memory with a kind of segment:offset method: first, you divide your linear address by the page size to find which page is needed, then you use the remainder to get the offset within this page.
Once unlocked, the page remains in memory until the OS needs memory: then, an unlocked page will be flushed to disk, in swap, and discarded in RAM... Until you lock (and load...) it again, but this operation may requires to free RAM, therefore another process may have its unlocked pages swapped BEFORE your own page is loaded again. And this is damnly SLOOOOOOW... Even on a SSD!
So, it's not always a good thing to use swap. A better way is to use memory mapped files - perfect for reading very big files mostly sequentially, with few random accesses - if it can suits your needs.

Des-initializing a region of memory

I have learn in the few past days the issue with memory overcommitment (when memory overcommit is activated, which is usually a default), which basically means that:
void* p = malloc(100);
the operative system gives you 100 contiguous (virtual) addresses taken from the (virtual) address space of your process, whose total range is OS-defined. Since that memory region has not been initialized yet, it doesn't count as ocuppied storage from a system-wide point of view, so it's a pure abstraction besides consuming your virtual addresses.
memset(p, 0, 5);
That uses the first 5 bytes, so from the point of view of the OS, your process ocuppies now 5 extra bytes, and so the system has 5 bytes less of free storage. You have still 95 bytes of uninitialized storage.
The system only crash or start killing processes when the combined ocuppied storage (initialized) of every process is beyond what the OS can hold.
If my understanding is right at this regard, is there a way to "des-"initialize a region of memory when you are done with it, in order to increase the system-wide free space, without loosing the address region requested by malloc or aligned_malloc (so you don't increase fragmentation over time)?
The purpose of this question is more theoretical than practical and not about actually "freeing memory", but about freeing memory while conserving already assigned virtual addresses.
Source about the difference between requesting virtual addresses and ocuppying storage: https://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6
PD: With knowing it for Linux to fill my curiosity I'm ok.
No, there is no way.
On most systems, as soon as you allocate memory, it counts towards RAM or swap.
As your link shows, on Linux, you may need to access the memory once so that the memory actually gets allocated. But as soon as you do, the system must keep that memory available somewhere, in case you access it later.
The way to tell the system you are done with the memory is to actually free it.

Stack memory not released

I have the following loop, which pops a C++ concurrent queue I have, from the implementation here. https://juanchopanzacpp.wordpress.com/2013/02/26/concurrent-queue-c11/
while (!interrupted)
{
pxData data = queue->pop();
if (data.value == -1)
{
break; // exit loop on terminating condition
}
usleep(7000); // stub to simulate processing
}
I am looking at the memory history using System Monitor in CentOS7.
I'm trying to free up the memory taken up by the queue, after reading the value from the queue. However, as the following while loop runs, I don't see the memory usage going down. I've verified that the queue length does go down.
It does go down, however, when -1 is encountered and the loop exits. (program is still running) But I can't have this, because where usleep is, I want to do some intensive processing.
Question: Why doesn't the memory occupied by data get free-ed? (according to System Monitor) Isn't the stack allocated memory supposed to be free-ed when the variable goes out of scope?
The struct is defined as follows, and populated at the beginning of the program.
typedef struct pxData
{
float value; // -1 value terminates the loop
float x, y, z;
std::complex<float> valueData[65536];
} pxData;
It's populated with ~10000 pxData, which roughly translates to 5GB. System only has ~8GB.
So it's important that the memory is free-ed up for doing other processing in the system.
There are a few things at play here.
Virtual Memory
First, you need to understand that just because your program is "using" 5 GB of memory does not mean that there are only 3 GB of RAM left for other programs. Virtual memory means that those 5 GB might be only 1 GB of actual "resident" data, and the other 4 GB may actually be on disk rather than in RAM. So it's important to look at the "resident set size" rather than the "virtual size" when you're looking at your program. And note that if your system actually runs low on RAM, the OS may shrink the RSS of some programs by "paging out" some of their memory. So don't worry too much about "5 GB" appearing in the system monitor--worry if you have a real, concrete performance problem.
Heap Allocation
The second aspect is why your virtual size does not decrease as you remove items from the queue. We can guess that you put those elements into the queue by creating them with malloc or new one-by-one, then pushing them onto the back of the queue. This means that the first element you allocated will come out of the queue first. And that in turn means that when you have drained 90% of the queue, your memory allocation might look like this:
[program|------------------unused-------------------|pxData]
The problem here is that in the real world, just because you free or delete something does not mean the operating system instantly reclaims that memory. In fact, it may not be able to reclaim any unused spans unless they are at the "end" (i.e. most recently allocated). Since C++ does not have garbage collection and cannot move items around in memory without your consent, you end up with this big "hole" in your program's virtual memory. That hole would be used to satisfy future memory allocation requests, but if you haven't got any, it just sits there, until the queue is completely empty:
[program|------------------unused--------------------------]
Then the system is able to shrink your virtual address space back down:
[program]
Which brings you back to where you started.
Solutions
If you want to "fix" this, one option is to allocate your memory in "reverse", i.e. put the last items allocated into the front of the queue.
Another option is to allocate the elements for the queue via mmap, which is something that e.g. Linux will do automatically for allocations which are "large." You can change the threshold for this by calling mallopt(3) with M_MMAP_THRESHOLD and setting it to be a little bit smaller than your struct size. This makes the allocations independent of each other, so the OS can reclaim them individually. This technique can even be applied to existing programs without recompilation, so is often useful if you need to solve this problem in a program you cannot modify.
A C++ implementation would call some operator delete to release dynamically allocated (using some operator new) memory. In several C++ standard libraries, new calls malloc and delete calls free.
(I am focusing with a Linux point of view, but the principles are similar on other OSes)
But while malloc (or ::operator new) is sometimes asking the OS kernel some more memory by system calls changing the virtual address space like mmap(2), free (or ::operator delete) is often simply marking the released memory zone as re-available to future calls to malloc (or to new)
So from the kernel point of view (e.g. as seen thru /proc/, see proc(5)...), the virtual address space is not changing, and the memory remains consumed, even if inside the application it is marked as "freed" and will be reused at some future allocation (by future calls to malloc or new)
And most C++ standard containers are internally using heap data. In particular your local (stack-allocated) std::map or std::vector (or std::deque) variable will call new & delete for internal data.
BTW, I find quite strange your declaration. Unless every struct pxData has exactly 65536 used valueData slots, I would suggest to use some std::vector so have
std::vector<std::complex<float>> valueData;
and improve your code accordingly. You'll probably need to do some valueData.reserve(somesize); and/or valueData.resize(somesize); and/or valueData.push_back(somecomplexnumber); etc....

Why can't we declare an array, say of int data type, of any size within the memory limit?

int A[10000000]; //This gives a segmentation fault
int *A = (int*)malloc(10000000*sizeof(int));//goes without any set fault.
Now my question is, just out of curiosity, that if ultimately we are able to allocate higher space for our data structures, say for example, BSTs and linked lists created using the pointers approach in C have no as such memory limit(unless the total size exceeds the size of RAM for our machine) and for example, in the second statement above of declaring a pointer type, why is that we can't have an array declared of higher size(until it reaches the memory limit!!)...Is this because the space allocated is contiguous in a static sized array?.But then from where do we get the guarantee that in the next 1000000 words in RAM no other piece of code would be running...??
PS: I may be wrong in some of the statements i made..please correct in that case.
Firstly, in a typical modern OS with virtual memory (Linux, Windows etc.) the amount of RAM makes no difference whatsoever. Your program is working with virtual memory, not with RAM. RAM is just a cache for virtual memory access. The absolute limiting factor for maximum array size is not RAM, it is the size of the available address space. Address space is the resource you have to worry about in OSes with virtual memory. In 32-bit OSes you have 4 gigabytes of address space, part of which is taken up for various household needs and the rest is available to you. In 64-bit OSes you theoretically have 16 exabytes of address space (less than that in practical implementations, since CPUs usually use less than 64 bits to represent the address), which can be perceived as practically unlimited.
Secondly, the amount of available address space in a typical C/C++ implementation depends on the memory type. There's static memory, there's automatic memory, there's dynamic memory. The address space limits for each memory type are pre-set in advance by the compiler. Which raises the question: where are you declaring your large array? Which memory type? Automatic? Static? You provided no information, but this is absolutely necessary. If you are attempting to declare it as a local variable (automatic memory), then no wonder it doesn't work, since automatic memory (aka "stack memory") has very limited address space assigned to it. Your array simply does not fit. Meanwhile, malloc allocates dynamic memory, which normally has the largest amount of address space available.
Thirdly, many compilers provide you with options that control the initial distribution of address space between different kinds of memory. You can request a much larger stack size for your program by manipulating such options. Quite possibly you can request a stack so large, than your local array will fit in it without any problems. But in practice, for obvious reasons, it makes very little sense to declare huge arrays as local variables.
Assuming local variables, this is because on modern implementations automatic variables will be allocated on the stack which is very limited in space. This link gives some of the common stack sizes:
platform default size
=====================================
SunOS/Solaris 8172K bytes
Linux 8172K bytes
Windows 1024K bytes
cygwin 2048K bytes
The linked article also notes that the stack size can be changed for example in Linux, one possible way from the shell before running your process would be:
ulimit -s 32768 # sets the stack size to 32M bytes
While malloc on modern implementations will come from the heap, which is only limited to the memory you have available to the process and in many cases you can even allocate more than is available due to overcommit.
I THINK you're missing the difference between total memory, and your programs memory space. Your program runs in an environment created by your operating system. It grants it a specific memory range to the program, and the program has to try to deal with that.
The catch: Your compiler can't 100% know the size of this range.
That means your compiler will successfully build, and it will REQUEST that much room in memory when the time comes to make the call to malloc (or move the stack pointer when the function is called). When the function is called (creating a stack frame) you'll get a segmentation fault, caused by the stack overflow. When the malloc is called, you won't get a segfault unless you try USING the memory. (If you look at the manpage for malloc() you'll see it returns NULL when there's not enough memory.)
To explain the two failures, your program is granted two memory spaces. The stack, and the heap. Memory allocated using malloc() is done using a system call, and is created on the heap of your program. This dynamically accepts or rejects the request and returns either the start address, or NULL, depending on a success or fail. The stack is used when you call a new function. Room for all the local variables is made on the stack, this is done by program instructions. Calling a function can't just FAIL, as that would break program flow completely. That causes the system to say "You're now overstepping" and segfault, stopping the execution.

Can CreateThread interfere with VirtualAlloc usage?

Is it possible for stack space allocated by CreateThread to interfere with the usage of VirtualAlloc? I can't find any discussion or documentation explaining precisely where stack space is allowed to be allocated...
The following more precisely illustrates my question:
uint8_t *baseA = (uint8_t*)VirtualAlloc(NULL,1,MEM_RESERVE,PAGE_NOACCESS);
// Create a thread with the default stack size
HANDLE hThread = CreateThread(NULL,0,SomeThreadProc,NULL,NULL,NULL);
// Possibly create even more threads here.
// Can this ever fail in the absence of other allocators? It doesn't here...
uint8_t *baseB = (uint8_t*)VirtualAlloc(NULL,1,MEM_RESERVE,PAGE_NOACCESS);
// Furthermore, in this test, baseB-baseA == 65536 (unless the debugger did something),
// so nothing appeared between baseA and baseB... not even enough space for the
// full 64kb of wastage, as baseA points to 4096 bytes by itself
If it does in fact use some analogue of VirtualAlloc, is there a way to change how Windows allocates stack space in a given process?
Stack space can be allocated anywhere in the address space of the process. There is no documentation on this now and it is unlikely that such documentation will appear in the future.
You can safely assume that creation of the thread and virtual alloc are independent. If this would not be the case, a lot of things would be broken. Allocator cannot give out overlapping address ranges. This is unthinkable. The problem is somewhere else.
The only thing that might look like correlation - amount of memory used and virtual address space fragmentation. In this case the latest request will simply fail.
I worked on a memory analysis utilities.
This picture shows distribution of the numbers of virtual allocations per size of the allocation.
This is example of the address space contents for a 32-bit process (blue - committed, magenta - reserved, green is a free memory).
What I write here is based on a real experience.
the windows NT kernel treats memory alloc operations on a high interrupt priority, also in a thread safe manner.
That means only one thread of a process can allocate memory at the same time, which makes all allocation processes thread safe (in theory).
there shouldn't be any interferences between stack allocation and virtual allocation.
Also you should keep in mine that you can allocate 1GB of space but your program still only uses it's 2mb RAM.
That's because windows "pre allocates" virtual space, but it doesen't assign it until you use it (write on it).
Actually the memory management is alot more complicated but for now you can be shure that no allocation operations should interfere, ever, since windows is locking your process onto one core, delaying all other threads alloc requests, as long as allocation is processed. (deadlock)
*EDIT: That also means that allocation and de-allocation is kinda a performance needing process if you allocate millions of small bits. It's always better to allocate/de-allocate larger memory areas, because of this deadlock behavior.